What is the right way to get all the records in a set?


#1

I have a few sets (each of them store rows which have a key and a bin) with number of records in the order of 1k-30k. I use the java client. I want to retrieve all the records in these sets. There are 2 ways I can think of

  • client.scanAll
  • client.query with statement param which has no filter at all and iterate over all the key, record pairs

scanAll says : "This call will block until the scan is complete - callbacks are made within the scope of this call" RecordSet’s next says : “This method will block until a record is retrieved or the query is cancelled.”

Its not very clear to me what these blocks are? Does this mean just the call is blocked? Or the server? And Overall which gives a better performance. Ideally i would like the total response time to be around 1s.

Which among the 2 is better and Is there any other better way?


#2

A query without a filter is transformed into a scan by the client library.

For the “scanAll” method, the client thread will wait for all server nodes to responds with the results of your scan before proceeding. There is also a scan implementation that accepts a callback which will process the results with the callback as they are returned by the server.

You should also be aware that the current scan implementation does not allow concurrent scans to run on a single node. There is a scan queue and scans are processed one by one in the order they arrive. The upcoming 3.6.x release will allow concurrent scans.


#3

As it is mentioned in http://www.aerospike.com/docs/guide/scan.html, Will a set scan request scan the entire namespace ? In that case I should probably use a LargeMap instead which holds all the key,value pairs I need. Any thoughts?


#4

I don’t think that would be a more performant option. With LargeMap, all your requests for this data would have to go to the same nodes and contend for this record.

I would first benchmark the scans and see if they can handle your performance requirement. (You could also do the same with LargeMap). If not, you could use secondary indices, store the setname into a bin and index that bin (this solution would also get around the lack of concurrent scans which will not be an issue in the upcoming 3.6.x release.) Optimizing set scans is also on the roadmap, though I’m not sure if it made it to 3.6.x.


#5

I tried both and i find the LargeMap suits better for our use case. The normal scans are completely non deterministic.

To be more clear.

  • The number of calls which ask the data from these small tables will be in the range of 10s-100s per day and as long as we get the response in 2 secs we will be good.
  • We have 100 million + records in the same namespace across different sets
  • What we are finalizing is each of these small tables are stored in a single record as LargeMap.

Examples:

  1. set = largeMaps, PK = table1, bin_name = ‘lmap’, bin_value = {k1:v1, k2:v2…}
  2. set = largeMaps, PK = table2, bin_name = ‘lmap’, bin_value = {k11:v11, k22:v22…}

#6

Hi. LargeMap is deprecated, so you should move away from it, and definitely not start with it. The LargeList type will continue to be supported and enhanced.


#7

@yesteapea,

We just released Aerospike Server Community Edition 3.6.0, which features a number of scan improvements, such as the ability to run concurrent scan jobs, and major scan performance enhancements; collectively, these should solve your scan issues. Let us know.

You can read more about the features and fixes of 3.6.0 in its release notes, and dowload it here.


#8

I’m using C#. Aerospike 3.6.

I need to retrieve all the Id/Key of the records in a Set. The Key is not set (WritePolicy.sendKey is false, or anyway it is null when I tried to read it) The ScanAll function does not have any override without the ScanCallback. I’m using a function like this:

public IEnumerable<string> ReadItemsIds()
{
    ScanPolicy policy = clientProvider.Client.scanPolicyDefault;
    var ids = new List<string>();
    ScanCallback callback = (key, record) => ids.Add(record.GetString("Id"));
    clientProvider.Client.ScanAll(policy, configuration.Namespace, configuration.ItemsSetName, callback, "Id");

    Thread.Sleep(1000); // wait the scan callback (?!)
    return ids;
}

It is returning a different number of records on every call (between 1700 and 1716).

  1. Why?
  2. What is the best way to accomplish my task?

Regards.

Alex


#9

I solved using this:

public IEnumerable<string> ReadItemIds()
{
    var policy = clientProvider.Client.queryPolicyDefault;
    policy.consistencyLevel = ConsistencyLevel.CONSISTENCY_ALL;
    var ids = new List<string>();

    using (var recordset = clientProvider.Client.Query(policy, new Statement() { Namespace = configuration.Namespace, BinNames = new string[] { "Id" } }))
        while(recordset.Next())
            ids.Add(recordset.Record.GetString("Id"));

    return ids;
}