LDT llist scan returning deleted elements

ldt
udf

#1

I’m able to consistently break llists using Lua like this:

function llist_break(rec, value, ts, old_ts)
  -- remove the old value
  llist_lib.remove(rec, 'conv_list', old_ts)
  -- make a new map
  local item = map {key = ts, value = value}
  -- add the item with the new value
  llist_lib.add(rec, 'conv_list', item)

  -- the number of items in the size do not match the number returned via a scan
  rec['conv_count'] = llist_lib.size(rec, 'conv_list')
  -- scan result length should match size()
  rec['conv_scan'] = #llist_lib.scan(rec, 'conv_list')

end

So if I have items in a list with a timestamp key and I delete them, then reinsert them with a new key value (basically changing the order) the deleted items are still returned in the scan.

Getting the llist.config when it breaks I see:

{ CompactList: 
  [ [Object],
    [Object] ],
 RootDigestList: [],
 PropEsrDigest: 0,
 SUMMARY: 'LList Summary',
 PropRecType: 1,
 PropMagic: 'MAGIC',
 NodeCount: 0,
 PropBinName: 'conv_list',
 StoreState: 'C',
 PropParentDigest: null,
 PropLdtType: 'LLIST',
 PropVersion: 2,
 PropItemCount: 1,
 RootKeyList: [],
 PropSubRecCount: 0,
 PropSelfDigest: 
  { '0': 19,
    '1': 137,
    '2': 90,
    '3': 24,
    '4': 180,
    '5': 84,
    '6': 207,
    '7': 50,
    '8': 180,
    '9': 123,
    '10': 155,
    '11': 254,
    '12': 120,
    '13': 65,
    '14': 13,
    '15': 173,
    '16': 22,
    '17': 119,
    '18': 219,
    '19': 63,
    length: 20,
    parent: undefined },
 TreeLevel: 1,
 LeafCount: 0,
 PageSize: null } }

Notice the CompactList contains 2 records and the PropItemCount shows 1.


#2

From aql here’s what it looks like after broken:

aql>  execute llist.scan('conv_list') on store_disk.test-llist where pk = 'test'
+------------------------------------------------------------------------------------+
| scan                                                                               |
+------------------------------------------------------------------------------------+
| [{"value":"val", "key":1444862968784649}, {"value":"val", "key":1444862968889253}] |
+------------------------------------------------------------------------------------+


aql>  execute llist.size('conv_list') on store_disk.test-llist where pk = 'test'
+------+
| size |
+------+
| 1    |
+------+
1 row in set (0.000 secs)

Size can’t be 1 if scan is returning 2 items.


#3

Some more details from playing around.

It seems that remove happening in very close proximity to adds results in records being orphaned (so they appear in the CompactList despite being deleted). Order doesn’t matter as well (remove first or add).

As a result some really odd behavior happens when you have these somewhat orphaned records.

Suppose:

Record 1 - orphaned (remove occurred but record is still shown in the compactlist) Record 2 - added never removed

The PropItemCount is 1, and a scan, find, etc. can retrieve both records.

If you remove Record 2. If you remove Record 2, the size() now returns 0 instead of 1 and now scan, find, etc. return no results. config() however still shows the orphaned record. I suppose internally the llist doesn’t actually scan, or attempt to find anything if it thinks there are no records. So you run into a situation where you have a supposedly empty list, you add a single item, then dozens of items appear since they were secretly there the whole time.

You can subsequently delete the orphaned record. The problem is, of course, detecting when a remove only decrements the PropItemCount but doesn’t actually remove the item in order to attempt it again.

We’re going to attempt to break the add/remove into completely separate UDFs to see if the delay solves the issue in our test suites. In somewhat more limited testing this seems to work.


#4

Ok found the issue. Quite simple. remove breaks the entire llist if you call it on an element that doesn’t exist:

aql>  execute llist.add('conv_list', 1) on store_disk.test1 where pk = 'test'
+-----+
| add |
+-----+
| 0   |
+-----+
1 row in set (0.000 secs)

aql>  execute llist.size('conv_list') on store_disk.test1 where pk = 'test'
+------+
| size |
+------+
| 1    |
+------+
1 row in set (0.000 secs)

aql>  execute llist.remove('conv_list', 2) on store_disk.test1 where pk = 'test'
+--------+
| remove |
+--------+
| 0      |
+--------+
1 row in set (0.001 secs)

aql>  execute llist.size('conv_list') on store_disk.test1 where pk = 'test'
+------+
| size |
+------+
| 0    |
+------+
1 row in set (0.000 secs)


aql>  execute llist.config('conv_list') on store_disk.test1 where pk = 'test'
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| config                                                                                                                                                                                                                                                         |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {"CompactList":[1], "RootDigestList":[], "PropEsrDigest":0, "SUMMARY":"LList Summary", "PropRecType":1, "PropMagic":"MAGIC", "NodeCount":0, "PropBinName":"conv_list", "StoreState":"C", "PropParentDigest":NIL, "PropLdtType":"LLIST", "PropVersion":2, "Prop |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 secs)

We were inadvertently calling the remove sometimes on elements that were already removed.

Remove needs to only decrement the size if it actually removes an item!

The only solution to fix an existing list is to scan it, remove all the items, then add_all to add them back in. A very expensive proposition if you inadvertently call remove once. Since you can’t force the size property back to sync with the actual size otherwise.


#5

@courtneyc,

Thank you for posting about LDTs in our forum. Please see the LDT Feature Guide for current LDT recommendations and best practices.


#6

@courtneyc,

Effective immediately, we will no longer actively support the LDT feature and will eventually remove the API. The exact deprecation and removal timeline will depend on customer and community requirements. Instead of LDTs, we advise that you use our newer List and SortedMap APIs, which are now available in all Aerospike-supported clients at the General Availability level. Read our blog post for details.