Distributed priority queue with duplication check


#1

I’m exploring UDF and Large Data Type to see if it is possible to build a queue with the following desirable requirement:

  • prioritized
  • no duplication for task id

The idea is implement with Large List and Large Map

  • Large List: store Priority+separator+TaskID as a string value. As it’s sorted, the first item will be the highest priority task.
  • Large Map: store TaskID as key, and task content as value

Then there are two UDF APIs:

  • add(TaskID, Priority, TaskDetails): check the map for duplication, if not, then add to the LList and LMap.
  • poll(): scan the list for the first time, split the value for TaskID, and get task content from LMap, then remove from LList and LMap

In theory it should work. A problem is only that the [LLIST API] (https://github.com/aerospike/aerospike-lua-core/blob/master/src/external/llist.lua) doesn’t provide a way for me to get the first N items easily. I think the scan api should have a way to limit to top 1%, (haven’t figured it out yet) but ideally, a “LIMIT” keywords that give me only the first N records.

Anyway comment or idea? Thanks in advance.


#2

HI, I think the function you want for LLIST is the range() function. You can give range() a min and max value. If min is nil, then it will start with the lowest value. If max is nil, it will go to the largest value. If both are nil, then it effectively becomes a scan. The range() function is in the current released server in the Lua functions, but has not yet been added to all of the client APIs. However, for any new LDT function, you can always invoke it directly as a UDF.

We don’t (yet) have a way to say “N Items”. The range() function works only with values at the moment.

By the way – when we implement the “Scoreboard” feature, it will be almost exactly as you describe. Similarly, large capped collections (using the LLIST to manage the timestamps) will work that way as well.

As we look at the various ways that LLIST will be employed, it is becoming more obvious which features (functions) that we need to add to each of the LDTs.


#3

I’m not sure how I did it – but somehow I got logged in as “old forum” when I went to reply to this post.

Anyway – this is me. Toby from Aerospike.


#4

Thx Toby

to my understanding, scan/range return a list, but not a iterator, right? i.e. even if I want to want to pick the first item and then exit, the database still scan for every record

looking at the lib_llist code, I found the take_min Object = llist.take_min(topRec,ldtBinName, src) function that seems to be exactly what I want. unfortunately it’s “under construction” :smile:


#5

It’s almost ready. How soon to you need it?

tjl


#6

don’t worry about my schedule. take your time and I’ll give it a try when it is ready. right now I’m maintaining the queue in a Java app, and once everything in AeroSpike is ready to define a solution, I’ll switch to it.

I suppose llist.take_min() guarantees no duplicated result for concurrent takers, and it takes the first item from a cluster instead of a node. Indeed, even if it takes only the first item from the local node, it is useful in some use cases already. (probably perform faster but very confusing to user)

Edited: previously, i didn’t check if the LMAP does exist and try to get a record from it and got “1401:LDT-Item Not Found” error. with a proper checking as follows, it works

if rec['_map'] == nil or pcall(function() return lmap.get(rec,'_map',key)==nil end) then ...

LSET has a exists() function but LMAP doesn’t have a contains_key(key). for my case as my value is small, there is probably no performance difference anyway

I think AeroSpike is really an amazing piece of software. as it is so fast, it enables many design option. :smile:


#8

mingfai,

Version >= 3.5.9 now has

  • take_first(count)
  • take_last(count) returns result in reverse
  • take_from(key, count)

– R


#9

@mingfai,

You can view the release notes for our latest server release (3.5.9) here and download it here.