Pagination in fetched result set from Aerospike


#1

I have millions of record in aerospike cluster and I wanted to fetch records in pages like first 1000, then 1001-2000, 2001-3000,… Is it possible that I can achieve this with aerospike java client library. If Yes then please give me some pointers.


#2

There is no pagination capability in Aerospike today. Can you tell us more about you exact use case and perhaps there is another way to achieve the same thing

Peter


#3

Use case is simple: We are loading lot of records with ttl in Aerospike. Currently it is in millions and in future it could be in billions. Then we need to read all the data from Aerospike and send it to other process over HTTP in batches. Currently we are loading all data in JVM and then doing pagination in java code. But this would not be scalable as data reaches close to billions in Aerospike. So we are thinking to fetch records in batches/pages from Aerospike itself so that JVM process do not run out of memory.


#4

Hi

You can find an example of how to process all the records in Namespace and (optionally) a Set, at:

https://github.com/helipilot50/aerospike-batch-processing.git

This example uses the scanAll() method in the Aerospike Java client to batch a number of records (1000 in this example) and process them.

I hope this helps

Peter


#5

I’m currently evaluating Aerospike and have a similar need. Specifically, I’m looking to do infinite scroll of items, where the second request asks for, say, 25 items, carrying on from where the last page finished. In Redis, this can be achieved with a SORT LIMIT 24 25. In DynamoDB, you can specify a limit, and along with the result data you will get a token that can be used to continue the query.

These items are strictly ordered, for example, reverse chronologically.

I could perhaps use an index query to do “less than” timestamp, but am curious how I can then limit the results. Perhaps an aggregate, or UDF? Any advice would be helpful!

Cheers,

Jamie.


#6

Hi Jamie

You are either doing a scan (like a table scan) of all the records in a Set or namespace, or A query on a secondary index.

Scans and Queries return records in no order, in fact, two successive scans or queries will return the records in different orders. There is no way to order the results. This means you cannot start scan/query 1 and read the first 100, terminate the scan, then start scan/query 2, skip the first 100 and read the next. There is no cursor or paging.

If you are doing a Query, there is a trick you can do using the RecordSet collection. You can read 100 records, stop reading them from the RecordSet and display them, then read another 100. The RecordSet uses a BlockIngQueue between your application and communications with the server. The queue is about 5000 in size.

When the RecordSet is closed, the Queue is destroyed and the Query Jobs on the server nodes are terminated.

This might solve your problem, BUT BEWARE, Query Jobs on the server nodes use memory, running a many thousands of them and causing them to wait, could exceed your server memory allocation.

Hope this helps

Peter


#7

Hello Peter,

Would there be any way to find out if a sort or limit feature would be in the product roadmap?

I’m evaluating AS now and this may be a problem. My particular use case is that I have a backend web app where I would show the latest 50 items (I have a bin that stores a time stamp) within a set of possibly tens of thousands of items.

I’m using the Go client.

Thanks


#8

Re: sort/limit - Certainly on the radar, but nothing concrete.

another thought is to explore Large sorted List, (for example, all events for a user), and using timestamp as the sort key.


#9

Awesome thanks for the info.


#10

@wchu

Unless I’m missing something here, simply persisting the timestamp in an llist won’t work. I want to paginate through a list of users sorted by timestamp descending. Yet, persisting only the timestamp in an LLIST won’t allow me to reference / look up that respective user record.

Wouldn’t I need to store a Map in an LLIST? Such as { “key”: , “uid”: <user’s PK value> }, that way, I can walk the list, grab a batch of say 10 items, then grab the respective user via the “uid” field? If so, there doesn’t seem to be any Go examples (i.e. tests) for working with “explicit values” (maps, for example).

This is quite a common use case (especially for more critical cases such as displaying news feeds) and I’ve been trying hard to find a simple solution.

Thanks.


#11

You are absolutely right. You will need to store a map in a LLIST, and the map entry “key” is special, which will be used as the ordering in the LLIST.

Java Client has an example here


#12

Using Aerospike’s Stream UDFs we can implement orderBy with limit, but can not implement pagination, this is because Aerospike Stream UDFs execute on all nodes, we can not do pagination in a sharding environment.

see this: Order by option


#13

Hi Peter, I am having the same problem of doing a infinite scroll, I have a “product” bin in which I have a “timestamp” on which I have applied the the secondary index. My query is like this, “fetch 10 products from the given timestamp value”. If you can help me on this it would be of very great of you.