100% Read load test in zipfian distribution (80% operation on 20% data) with YCSB benchmark shows 3 times less IOPS. But improves after running 95% Read and 5% Update

Hi Community, I am facing a very weird scenario while running 100% Read load test using YCSB specifically using Zipfian distribution (80% operations on 20% records) just after freshly loading the database with records. With 100% Read, fresh after loading data to DB, I am getting 3 times lesser IOPS (150K/sec) with just 20% CPU even after increasing the client threads there is no increase in CPU usage nor IOPS but the latency increases a lot. But once I run 95% Read and 5% Update the IOPS are back to normal (450K/sec). And after running 95:5 and then running 100% read the IOPS are normal (450K/sec).

Could anyone help on what are the probable causes of seeing less IOPS just after freshly loading the Aerospike Database and then running 100% Read load test on it using Zipfian distribution. Point to note, in Uniform distribution (each record has equal chances of getting picked) this is not causing issue, after freshly loading data and running 100% Read, IOPS are normal.

@kporter :pray:

What does your namespace configuration stanza look like?

@pgupta thanks a lot for reverting back: Below is the snippet of the namespace configuration stanza and this exactly same in all the three nodes of the cluster that I have setup.

The nvme1n1 is a SSD with 330 GB storage of which we have loaded data of 270 GB

namespace ycsb {
        replication-factor 2
        memory-size 28G


        storage-engine device {
            device /dev/nvme1n1
            write-block-size 128K
        }
}

You can see in the below screenshot 128 client threads had same RPS as that of 256 and at the same CPU level, but big difference in latency metrics.

But once I run 95%Read and 5% Update everything is back normal

Also after running 95% Read and 5% Update and then running Read 100% I am seeing normal behaviour.

Interesting observation. My conjecture is that there is some kind of bias in your test setup that may be getting addressed once there is data from across the device in the post-write-queue, I assume default 256 blocks of 128KiB … which in your case is holding about 12% of the recently updated data. Once you run the 5% update load, you get the benefit of reading from the post-write-queue “cache” from data updated from random areas (blocks) of the device, as opposed to the last 256 blocks of writes only in the first case. Can you repeat your test with post-write-queue set to 0? Its a dynamic config parameter.