High Read Latency during Heavy Writes


I’m using Aerospike as a backend datastore(15 nodes) in the following setup: I have a service serving data out of it with a Read QPS ~3K QPS depending on the time of the day. I have a batch job running that writes to the aerospike cluster once a day at about 60K QPS for 2-3 hrs. My problem is that while the writes are happening, the read latencies go to 1s P95. I checked the machine stats and found the CPU, RAM, Disk and Network usage fairly under control. I am using Java Client for both reads and writes. One very weird observation: Restarting the Aerospike cluster fixes the problem for 12 hrs, which is unintuitive as migrations are happening because of that.