100% Read load test in zipfian distribution (80% operation on 20% data) with YCSB benchmark shows 3 times less IOPS. But improves after running 95% Read and 5% Update

Sovit_Patnaik · June 20, 2023, 1:11pm

Hi Community, I am facing a very weird scenario while running 100% Read load test using YCSB specifically using Zipfian distribution (80% operations on 20% records) just after freshly loading the database with records. With 100% Read, fresh after loading data to DB, I am getting 3 times lesser IOPS (150K/sec) with just 20% CPU even after increasing the client threads there is no increase in CPU usage nor IOPS but the latency increases a lot. But once I run 95% Read and 5% Update the IOPS are back to normal (450K/sec). And after running 95:5 and then running 100% read the IOPS are normal (450K/sec).

Could anyone help on what are the probable causes of seeing less IOPS just after freshly loading the Aerospike Database and then running 100% Read load test on it using Zipfian distribution. Point to note, in Uniform distribution (each record has equal chances of getting picked) this is not causing issue, after freshly loading data and running 100% Read, IOPS are normal.

@kporter

pgupta · June 20, 2023, 4:05pm

What does your namespace configuration stanza look like?

Sovit_Patnaik · June 20, 2023, 5:11pm

@pgupta thanks a lot for reverting back: Below is the snippet of the namespace configuration stanza and this exactly same in all the three nodes of the cluster that I have setup.

The nvme1n1 is a SSD with 330 GB storage of which we have loaded data of 270 GB

namespace ycsb {
        replication-factor 2
        memory-size 28G


        storage-engine device {
            device /dev/nvme1n1
            write-block-size 128K
        }
}

You can see in the below screenshot 128 client threads had same RPS as that of 256 and at the same CPU level, but big difference in latency metrics.

But once I run 95%Read and 5% Update everything is back normal

Also after running 95% Read and 5% Update and then running Read 100% I am seeing normal behaviour.

pgupta · June 20, 2023, 11:50pm

Interesting observation. My conjecture is that there is some kind of bias in your test setup that may be getting addressed once there is data from across the device in the post-write-queue, I assume default 256 blocks of 128KiB … which in your case is holding about 12% of the recently updated data. Once you run the 5% update load, you get the benefit of reading from the post-write-queue “cache” from data updated from random areas (blocks) of the device, as opposed to the last 256 blocks of writes only in the first case. Can you repeat your test with post-write-queue set to 0? Its a dynamic config parameter.

Topic		Replies	Views
Read and write path in aerospike	2	1634	January 4, 2020
Aerospike batch requests performance tuning Tuning	12	1881	November 18, 2022
Not able to achieve 1Million TPS in Aerospike Benchmarks despite of capable hardware Aerospike Server Benchmarks	19	9496	March 29, 2017
Read/write performance spikes	1	3420	December 23, 2015
Aerospike Benchmark POC Aerospike Server Benchmarks	7	3842	December 10, 2015

100% Read load test in zipfian distribution (80% operation on 20% data) with YCSB benchmark shows 3 times less IOPS. But improves after running 95% Read and 5% Update

Related topics