Hi,
We are using aerospike for a batch read use case (no writes, no regular reads).
We have found that latency is lower when using more but smaller nodes (we use gcloud).
CLUSTER A : 2 nodes
Node config :
16 vCPU
30 Go RAM
1x local SSD 375 Go
1x persistant SSD 400Go (configured as shadow device)
batch-index
10:18:55 10 19.70 1.97 0.00 6284.9
10:19:05 10 15.25 1.65 0.00 7013.9
10:19:15 10 16.00 1.51 0.00 5514.6
10:19:25 10 15.66 1.58 0.00 6931.0
-------------- ------ ------ ------ ----------
avg 16.44 1.36 0.01 7002.0
max 22.35 5.96 1.57 7286.6
CLUSTER B : 3 nodes
Node config :
8 vCPU
10 Go RAM
1x local SSD 375 Go
1x persistant SSD 400Go (configured as shadow device)
batch-index
10:36:41 10 3.37 0.08 0.00 7181.4
10:36:51 10 3.85 0.10 0.00 7156.7
10:37:01 10 3.52 0.08 0.00 7116.6
10:37:11 10 3.79 0.10 0.00 7218.2
-------------- ------ ------ ------ ----------
avg 7.00 0.16 0.00 6243.0
max 51.49 3.76 0.17 7397.2
The load was exactly the same on the 2 clusters, we use batch index with 200 keys on a collection of 40M elements. The namespace is configured to be on SSD with index in memory.
As you can see on cluster A, 16% of read were > 1ms whereas only 7% on cluster B. None of the clusters were bounded by cpu during the test.
My questions are :
-
Is the difference only related to the fact that we have 1 extra SSD on the 2nd cluster thus sharing the load between 3 disks instead of 2 disks ? Local SSD on gcloud have extremely high IOPS limits and we wouldn’t expect to notice a change
-
Is it possible that it is also related to the fact that 1 node handles less keys in memory thus retrieving a key is faster ?
-
Does the batch-index histogram also include client latency ?
Sometimes we notice higher latencies and I am not sure if it comes from Aerospike or not. -
As we only use the batch read operation, is it possible to fine tune aerospike to perform better even if it decreases the write/regular read latencies ?
Thanks.