Aerospike Performance issue and Understanding asloglatency tool

I am encountering problems with scalability of reads on Aerospike. I am trying to determine the problem and I see that Aerospike provides a tool called asloglatency to monitor latency within the db.

There are many options but the description from the Manual shed little light on which part of the lookup pipeline is being measured.

The problem I am having right now is that the batch read performs pretty slow on 1000 rows and the latency varies greatly. I would like to understand internally where is the bottleneck or jitter appear in the db.

If anyone can help answer the following questions or give me some tuning pointers will be very helpful.

  1. Can someone walk me through the process (in detail) the various stages aerospike db goes through from the moment a query and a batch read is issued on the client to an aerospike client.

  2. What do the following options actually measure on the asloglatency tool

  • reads_cleanup - Histogram around as_storage_record_close and as_record_done.
  • reads_internal - Read histogram from internal to rw_complete. (what is internal and what is rw_complete)
  • reads_net - Histogram around the network send on reads. (from moment data is send to data all been sent back to client?)
  • reads_q_process - Histogram from transaction off queue to read_start. (what is the difference between this and batch_q_process)
  • reads_resolve - Histogram that tracks duplicate resolution after receiving all messages from other nodes.
  • reads_resolve_wait - Histogram that tracks the time the master waits for other nodes to complete duplicate resolution on reads. (what is duplicate resolution, does it increase with more nodes?)
  • reads_start - Read histogram from read_start to internal. (what is the meaning of read start, time when the read request gets processed? what consists of the internal)
  • reads_storage_open - Histogram around as_storage_record_open. (what is as_storage_record_open, access to ram or disk?)
  • reads_storage_read - Histogram taken from after opening the device to after reading from device. (time taken to read from ram or disk?)
  • reads_tree - Histogram from rw_complete to fetching record from rb tree. (what is rw_complete, is rb tree the structure where the records keys and their pointer to data are stored in)
  • batch_q_process - Histogram of time spent processing batch messages in transaction queue. (is this time spend in popping from the transaction queue for batch reads)
  • batch_index_reads - This metrics is not explained in the manual. (what does it actually mean)

Any help or links to the breakdown of the search pipeline in Aerospike for batch reads and random key value lookup is greatly appreciated!

I would recommend the following articles: