Maximum size for Large Ordered List (llist)


#1

What is maximum size for the Large Ordered List? Is it normal to store 5Mill records in it?


#2

Max83,

Large list is by design unbounded.

Only operating limitation is given luajit has bounds on how much memory it can consume at run time (2G), so some operations like scan may fail.

– R


#3

Max83,

What are you trying to do :slight_smile:

– R


#4

I need to sort many millions of links per domain, for each domain it`s personal links collection sorted by prority. Main purpose: select from each domain by 5-10 links every second. Is it possible?


#5

Max83,

Domain1 -> Llist of links sorted by priority

Domain2 -> Llist of links sorted by priority

Domain3 ->

DomainN

execute llist.find_first(‘binname’, 10) from where PK=Domain1

Is it correct representation.? How many domains do you have …

– R


#6

Yes, representation correct. I have about 40000 domains. Each domain have from hundred thousands to millions links.


#7

Max83,

If you are sorting the list based on the priority. Then how does updation of priority work ?? You cannot be looking up the link in the list and moving from one position to other. Do you plan to just add link with new priority in the begining ??

On the modelling of data. With 40K domains. Depending on what is throughput requiement (read/write/update) they may become hot.

– R


#8

new upcoming links will be with a random priority, so it’s should be placed to different positions. Is it ok for aerospike?

P.S. requiremnt operations all (read/write and update).


#9

So same link cannot show up twice ?? If not then if you need top 10 why do you need to store a million ?

– R


#10

each query will get next 10 links. Mark previous as visited i will use filter to take only unvisited 10 links and after that mark them as visited.

Will it work?


#11

Should work !!!

Remember there is set of take functionality as well

llist.take_first llist.take_last llist.take_from

If you do not need the links which is already visited …

Keep in mind that given only 40k keys. They may present themselves as hot keys …

– R


#12

Am I right to assume, using remove_first five records and then find_first 5 records will work more faster than using udf with filter to extract unvisited links?

I hope this is last question )))


#13

Yes !!

take = find + remove

If you are not going to look at visited links then storing it in list seems to have no utility.

– R


#14

raj, a lot of thanks, information was very helpful.


#15

A similiar question here. We have a use case for a llist that may grow easily to a 100 billion entries and beyond (can provide more details by email if needed). We assume that if we can sustain 10.000 find()s per sec on it everything should be fine (it’s only queried from offline processing stuff). The entries consist of: a 64-byte key and a potential list of 32-bit IDs. Only about 10% of the entries should contain more than 1 ID in that list.

We now ask ourselves what that amount will implicitly mean to our cluster and what kind of architecture would be the best:

  • A huge LDT must be stored by one server only, right? Only manual sharding possible?
  • We once found some UDF transform function that could be applied to manually bring entries to a compact form and vice versa - but it’s gone on the new docs? Does that still exist (so we could go down to 12 bytes/entry + x bytes overhead per ssd block used)?

We currently plan to set the 4 bytes values to either the first ID of the list or 0 where the later one means that ‘there is a list’ and that that list can be queried from a second, smaller LDT (same 64-bit primary) or from a dedicated entry. This way we hope to being able to limit the entry-size on the ‘big’ LDT to 12 bytes / entry with maxObjectSize and such.

Cheers, Manuel


#16

@max83 and @ManuelSchmidt:

Thank you for posting about LDTs in our forum. Please see the LDT Feature Guide for current LDT recommendations and best practices.


#17

@max83 and @ManuelSchmidt,

Effective immediately, we will no longer actively support the LDT feature and will eventually remove the API. The exact deprecation and removal timeline will depend on customer and community requirements. Instead of LDTs, we advise that you use our newer List and SortedMap APIs, which are now available in all Aerospike-supported clients at the General Availability level. Read our blog post for details.