Scan or batch? Which one is better and when?

Hello everyone

I’m trying aerospike for the first time and one of the things I’m exploring is retrieving a number of records in a single request.

I’ve experimented with doing that with scans or batch get and both have advantages and disadvantages.

I was wondering if there is any best practice recommended for doing this type of data retrieval. Though the retrieval of all records don’t need to be super fast, I will have thousands of clients doing that at the same time.

Any help is greatly appreciated. Cheers

DX

1 Like

Use ‘get’ when you only need to fetch a single record or a small number of records. A batch as a bit of overhead, for a large number of records, it will out perform get, but get may outperform batch when the number of records are small. What defines large/small are dependent on your cluster, so you would need to experiment.

Use ‘batch’ when fetching many known keys. Batch will normally outperform scans because a scan must traverse the entire index on every node.

Use ‘scan’ when the keys are not known or you need all or some fraction of all records.

Use ‘query’ when you know you will be performing the same lookup based on some criteria that is supported by secondary indexes and you are willing to pay the memory cost associated with a secondary index. Queries are typically faster than ‘scan’.

All the above also support additional filters to select only records that pass those filters, see predicate expressions.

2 Likes