Hello,
We are testing out aerospike and are trying to figure out how to troubleshoot a slow node. We are aiming for sub millisecond response time which we have been able to achieve before, however this cluster we are now testing is under the most load.
We are doing very simple by key look ups. When looking at the asmonitor latency read stats, I see:
timespan ops/sec >1ms >8ms >64ms
10.1.109.77:3000 02:21:56-GMT->02:22:06 6357.7 0.98 0.00 0.00
10.1.111.71:3000 02:22:01-GMT->02:22:11 6109.4 6.37 3.35 1.00
10.1.111.72:3000 02:22:03-GMT->02:22:13 5932.2 0.91 0.06 0.00
10.1.111.73:3000 02:21:56-GMT->02:22:06 6012.0 0.92 0.00 0.00
As you can see the second node in the list is performing poorly compared to the rest. I have compared logs and don’t see that node doing anything unusual, any ideas for troubleshooting this?
Config:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
namespace testing {
replication-factor 2
memory-size 8G
high-water-memory-pct 70
default-ttl 4d # 30 days, use 0 to never expire/evict.
storage-engine device {
device /dev/sdb
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K
# Use the line below to store data in memory in addition to devices.
# data-in-memory true
}
}
Thanks in advance.