We would like to evaluate the performance of Aerospike (3.9) with decent SSD configuration. Our Aerospike server is running on ESX VM. The VM has 8 CPUs and 8GB RAM. All the Aerospike client are running on the same ESX host. Each client has 4 CPUs and 4 GB RAM. The ESX machine has 20 cores and has 72 GB RAM.
Before running aerospike benchmark we evaluated SSD performance on SSD RDMed to Aerospike Server Linux VMWare virtual machine.
With 4K block size we got following performance numbers
Case 1: IODepth = 1, No of Jobs = 1 Random Writes = 472 MB/s, 11K IOPS Random Reads = 233 MB/s, 5K IOPS
Case 2: IODepth = 32 No of Jobs = 1 Random Writes = 341 MB/s, 85K IOPS Random Reads = 436 MB/s, 109K IOPS
We used same SSD to run Aerospike benchmark (available as part of Aerospike client C library source repository).
For value size 4K, 10 million keys - benchmark reported following numbers
- Number of threads 1
Random Read = 2K TPS (RU,100) Reandom Write = 2k TPS (RU,0)
- Number of threads = 32
Random Reads = 25K TPS Random Writes = 19-20K TPS
Since the numbers were not satisfactory, we changed default configuration to
service-threads 16 transaction-queues 12 transaction-threads-per-queue 3
However, TPS numbers remained almost same. So we tried with more threads on client.
- No of threads = 128
Random Reads = 33K TPS
IOSTAT on SSD consistently show output similar to following
Device: rMB/s wMB/s avgrq-sz avgqu-sz await sdb 133.73 0.00 9.00 8.40 0.28
There are, on an average, only 8-9 outstanding read requests on SSD. Each request being of the size 9 sectors. Overall load average on Aerospike server is very high. TOP shows asd is using close to 5 CPUs completely out of 8 CPUs.
We tried running benchmarks from multiple client VMs. However, overall performance does not scale.
We looked in to the code as well, it seems like
- Aerospike is not using libaio or posix aio for doing IOs on SSD. That is, reads are served using combination of lseek + read.
- It seems like only limited number of threads do synchronous IOs (probably 5) IOs on SSD.
Is our observation correct? What can we do to improve the performance?
Thanks and Regardds, Prasad