Unique values in lset


#1

I’m using the lset large data type to store roughly large numbers of unique integers, with repeated attempts to store some. It’s effectively storing a list of ids that have accessed recently.

However any time the same id gets stored there is a “LDT-Unique Key or Value Violation” error. I hardly consider this to be an error, and it creates large amounts of meaningless logs. Is there a way to prevent this from being outputted without altogether disabling udf logs? I could always check for existence before attempting to make a put, but that seems very inefficient.

Also, what are the scaling limits of this? The documents seem to suggest a maximum set size of 2gb, but the settings for DEFAULT_SMALL_CAPACITY is only 500k. If dealing with say 8 byte values should I expect to be able to fit 10^7(or even 10^8) entries in the set or is that not going to work?

Finally, what is the recommended way to insert large amounts of data fast? Should we package our own asynchronous wrapper and maintain many connections, or is there a better way to do this(Since neither the execute call or LargeSet classes appear to have asynchronous versions)


#2

You can do it through UDF so exist check can avoid round trip.

local lset = require('ldt/lib_lset');
function check_and_set(rec, lSetBinName, value)
   if (lset.ldt_exists(rec, lSetBinName) == 1 and lset.exists(rec, lSetBinName, value) == 1) then
      return "Found";
   else
      lset.add(rec, lSetBinName, value);
      return "Inserted";
   end
end


aql> execute chaman.check_and_set('lSetBinName', 1) on test where PK='1'
+---------------+
| check_and_set |
+---------------+
| "Inserted"    |
+---------------+
1 row in set (0.000 secs)

aql> execute chaman.check_and_set('lSetBinName', 1) on test where PK='1'
+---------------+
| check_and_set |
+---------------+
| "Found"       |
+---------------+
1 row in set (0.000 secs)

aql> execute chaman.check_and_set('lSetBinName', 2) on test where PK='1'
+---------------+
| check_and_set |
+---------------+
| "Inserted"    |
+---------------+
1 row in set (0.000 secs)

Also, what are the scaling limits of this?

This is single level hash table. With the maximum overflow bucket size is as recod size bounds. What is the integer your planning to store. Alternative would be to use Llist which is unbounded B+tree {Which also in dev cycle much more stable datastructure}.

Finally, what is the recommended way to insert large amounts of data fast?

Batching multiple put request in single call to Aerospike should speed things up.

– R


#3

Thank you for your response.

I guess the UDF solution is workable though it doesn’t seem to make much sense as I’d say it’s fairly normal to use sets in the way we are. I will also try using the llist to see if it offers better performance.

Batching multiple put requests is not an option since we’re inserting data in real time from logs as they come in through our log aggregation system. I will however look and see if there is any way I can accumulate multiple entries to be batched together into a single insert.


#4

I hear you. Thanks for the feedback.

– R


#5

@juhanic,

Thank you for posting about LDTs in our forum. Please see the LDT Feature Guide for current LDT recommendations and best practices.


#6

@juhanic,

Effective immediately, we will no longer actively support the LDT feature and will eventually remove the API. The exact deprecation and removal timeline will depend on customer and community requirements. Instead of LDTs, we advise that you use our newer List and SortedMap APIs, which are now available in all Aerospike-supported clients at the General Availability level. Read our blog post for details.