I am running load testing using Aerospike 3.7.3, community edition and have a recurring issue where some of the histograms will randomly stop populating, most important of all, write_master. Nothing I can do short of restarting the server will reactivate it but that generates migrations (which affects my numbers) and it will eventually stop logging. Running hist-track-stop/start doesn’t help and neither does changing the logging parameters nor turning microbenchmarks on/off.
I can verify that traffic is still passing through the system and if I turn on microbenchmarks or storage-benchmarks I do get some stats, but none of the base histograms are working.
My only configuration for logging is “context any info” and there are no issues with permissions. This cluster is in AWS inside our VPC using a mesh of 5 nodes with SSD storage on the built in ephemeral drives. The traffic is, for this test, 100% writes.
Any ideas?
How are you viewing the latency histograms? Using asadm or asloglatency tool or AMC?
asadm, asloglatency, and asmonitor all give the same results.
Can you elaborate more on the command that you are using. Do the histograms get populated in the server logs (/var/log/aerospike/aerospike.log), or stop there too?
Can you paste a sample output of the loglatency which shows the gap in logging:
asloglatency -h writes_master -f head
Sorry, but my test environment is currently spun down and I’m not going to be able to get back to this until late next week at the earliest to get a sample of the behavior.
I can confirm though that the output of asloglatency and running the “latency” command in asmonitor both show the same results. The histogram counters in /var/log/aerospike/aerospike.log just stop incrementing, even though there are no errors in the log and I can see that number of keys continues to increase, implying continued write activity.
Ok sure… I attempted to reproduce it and was unable to. But, will be interesting to see your reproduction data.