How do I compare latency and timeouts at server side between two Aerospike clusters of different versions - 3.8 and 3.14?

I have two aerospike clusters -

- Older cluster with the servers having the following combinations - storing data in disk + i2.2xlarge instances + Aerospike build vesion 3.8.2.3

- Newer cluster with the servers having the following combinations - storing data in memory + r3.4xlarge instances + Aerospike build version 3.14.1.1 + using partition-tree-sprigs

I wanted to compare the server side latencies and timeouts on them. I enabled the asgraphite daemon which comes in-built with Aerospike, by the following commands - python /opt/aerospike/bin/asgraphite --start --prefix aerospike.stats -g -p

I cannot see the latency stats against the old cluster in the graphite console.

Also, I am confused, as to which latency stat should I consider. Following stats are available against the older cluster -

Metric Value observed batch_index_timeout 0 batch_timeout 0 err_tsvc_requests_timeout ~80K

The batch stats are showing 0, as expected, because we are not doing any batch queries. The new cluster being higher than 3.9 does not have the err_tsvc_requests_timeout metric.

Link to question on stackoverflow - performance - How do I compare latency and timeouts at server side between two Aerospike clusters of different versions - 3.8 and 3.14? - Stack Overflow

There was a stat/log reorg in 3.9, the metric reference page should provide where they were moved. Some stats/histograms were refined to only measure what they were meant to measure so comparing pre 3.9 stats to post 3.9 may not be apples to apples.

Noticed a typo on the metrics page for err_tsvc_requests_timeout, it should have directed you to client_tsvc_timout.

It depends on what you want to track. If you’re not using batch-reads, for example, there’s no reason to track it.

The description of each component of the latency histogram is here: http://www.aerospike.com/docs/operations/monitor/latency