Higher Latencies In Few Particular Nodes

Janardhanan_V_S · February 17, 2021, 5:23pm

Hi,

We are running a 4 node Aerospike cluster in AWS. We are using i3.2xlarge series of instances which come with 32GB Ram and 900GB of ephemeral volume. We are using a shadow ebs volume to replicate the writes as well. All these instances are in same AWS region and availability zone. We have been running this cluster without any issues for around 1 year. But recently in the past month, we have seen that, there are higher latencies in one particular instance. The read, write and udf latencies of other 3 nodes remains normal, while that one 1 node alone is showing abrupt higher peaks many times that of normal average. We thought this might be a local hardware issue with AWS EC2 instance and we tried replacing the node. But again the newly replace node was also showing the same higher latencies. We tried comparing configs, but they were all same across all 4 nodes. We tried comparing benchmarks, microbenchmarks etc., But in all of them, only the last node was showing anomaly and all other nodes are fine.

These higher latencies haven’t caused any issues as such. But we really wanted to know the reason for the unexplained behaviour in these 1 node alone. Tried various blogs / suggestions / documentations and we have exhausted ourselves with resources. We tried replacing one among the 3 nodes with lower latencies and the newly introduced node is also now showing higher latency.

Hence, expecting some ideas to debug, resolution steps, things to take care etc., to find the reason for increased latencies in particular nodes.

We are using aerospike version 3.13 and we have services using Go, PHP, Java clients in our system.

pisush · February 17, 2021, 8:56pm

Hey Janardhanan,

The sunset for v3.13 was over 2 years ago. I’d recommend upgrading your version.

And some additional recommendations:

make sure you’ve followed the v3.13 cluster protocols switch-over
confirm that all the configurations are the same for all nodes, and there isn’t some overriding to the config
make sure you are following the Amazon EC2 Deployment Guide with all the recommendations, e.g. the timeout.
try running the microbenchmark systems

meher · February 19, 2021, 2:02am

Turning on microbenchmarks would indeed potentially help narrow down where the latency is coming from on that node… Other than that, I would recommend a thorough analysis. Best is to graph all metrics and see if any patterns, specifically around workloads (reads, writes, their success/notfound/failure rates), other background tasks that could be weighted differently on the node performing differently, like defragmentation, nsup cycles, etc…). Hard to provide more input other than analyzing full logs…

sandersbud4 · April 19, 2021, 9:18am

@Janardhanan_V_S You have described well. I never expect this work is so easy. Thanks for doing this.

system · April 19, 2022, 9:19am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aerospike latency issues aws , latency	1	1763	May 14, 2018
Intermittent high latency Tuning	1	1547	April 17, 2018
Aerospike Increased write Latency with I3 series EC2 Boxes with nvme disks Operations	9	1976	March 14, 2017
Latency spike daily at same time. Aerospike build 3.13.0.10	2	673	November 3, 2021
Extremely Long latencies using put Node.js Client	1	2298	December 20, 2015

Higher Latencies In Few Particular Nodes

Related topics