I have a cluster of 4 servers. One of the namespaces is raw device based. The devices reside on a SAS mechanical hard drive.
Now here is the weird part of the story. I am running one of the tests with small records (2x50 bytes = 100 bytes total). I get to write at between 150 - 200k OPS. Now when it comes to reading - the throughput drops to 4k OPS!!! Yes, I know - this is might weird, and I am totally confused.
The servers show very little load during the read. The iotop and nload show nothing I can identify as a problem.
No. The Aerospike database is intended to be a high performance, low-latency database. Because of this, the physical limitations of rotational disks add an unacceptable amount of latency to the data.
Hmmm interesting … thinking aloud …since your records are only 100 bytes, you are probably using 256 bytes per record (with overhead & 128 byte boundary). If write-block-size, default is 1 MB, you are fitting about 4K records in 1 MB in RAM while writing, which is asynchronously flushed to disk as a 1 MB block. On read, you reading individual record from disk in 128 byte read chunks. If you are reading a recently updated record, you are probably getting it from post write queue in RAM otherwise you are accessing the disk. So your read delay is coming from slow performance of the disk for records that have to be fetched from the disk. If the write-block-size was 128K, then you would fit about 500 records per block. You can play with write-block-size on a test cluster and see if the performance tracks. Check write-q value in the /var/log/aerospike/aerospike.log to see if the disk is slow. If the disk is not the bottleneck, write-q will be zero under write throughput. You have a very large max-write-cache - 8G - (64M is default) which is also helping you with the writes. You can also test with reducing post-write-queue to a very small number and see if read throughput gets worse.