Benchmark throughput drops to 0 when node removed from Aerospike cluster

Benchmark throughput drops to 0 when node removed from Aerospike cluster

Problem Description

When running an Aerospike benchmark against a cluster, when testing node removal, either by shutting down the node or dropping IPtables, the throughput rapidly drops to 0 for reads and writes. The throughput does eventually recover as shown below:

2020-09-14 01:34:13.640 write(tps=10151 timeouts=0 errors=0) read(tps=10160 timeouts=0 errors=0) total(tps=20311 timeouts=0 errors=0)
2020-09-14 01:34:14.640 write(tps=10197 timeouts=0 errors=0) read(tps=10202 timeouts=0 errors=0) total(tps=20399 timeouts=0 errors=0)
2020-09-14 01:34:15.640 write(tps=10172 timeouts=0 errors=0) read(tps=10152 timeouts=0 errors=0) total(tps=20324 timeouts=0 errors=0)
2020-09-14 01:34:16.641 write(tps=6269 timeouts=0 errors=0) read(tps=6396 timeouts=0 errors=0) total(tps=12665 timeouts=0 errors=0)
2020-09-14 01:34:16.920 WARN Thread tend Node A1 10.0.0.1 3000 refresh failed: Error -8: java.net.SocketException: Connection reset
2020-09-14 01:34:17.641 write(tps=56 timeouts=6 errors=20) read(tps=63 timeouts=0 errors=20) total(tps=119 timeouts=6 errors=40)
2020-09-14 01:34:17.924 WARN Thread tend Node A1 10.0.0.1 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2020-09-14 01:34:18.641 write(tps=596 timeouts=25 errors=116) read(tps=577 timeouts=0 errors=108) total(tps=1173 timeouts=25 errors=224)
2020-09-14 01:34:19.642 write(tps=37 timeouts=19 errors=0) read(tps=53 timeouts=0 errors=0) total(tps=90 timeouts=19 errors=0)
2020-09-14 01:34:19.928 WARN Thread tend Node A1 10.0.0.1 3000 refresh failed: Error -8: java.net.SocketTimeoutException: connect timed out
2020-09-14 01:34:19.938 INFO Thread tend Remove node A1 10.0.0.1 3000
2020-09-14 01:34:20.642 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:21.642 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:22.643 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:23.643 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:24.644 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:25.644 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:26.644 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:27.645 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:28.645 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:29.645 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:30.646 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:31.646 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:32.646 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:33.647 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:34.647 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:35.647 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:36.648 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:37.648 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:38.648 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:39.648 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:40.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:41.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:42.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:43.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:44.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:45.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:46.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:47.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:48.650 write(tps=40 timeouts=0 errors=1) read(tps=29 timeouts=0 errors=0) total(tps=69 timeouts=0 errors=1)
2020-09-14 01:34:49.650 write(tps=6884 timeouts=0 errors=11) read(tps=6826 timeouts=0 errors=0) total(tps=13710 timeouts=0 errors=11)
2020-09-14 01:34:50.651 write(tps=9339 timeouts=0 errors=0) read(tps=9315 timeouts=0 errors=0) total(tps=18654 timeouts=0 errors=0)
2020-09-14 01:34:51.651 write(tps=9330 timeouts=0 errors=0) read(tps=9126 timeouts=0 errors=0) total(tps=18456 timeouts=0 errors=0)
2020-09-14 01:34:52.651 write(tps=9459 timeouts=0 errors=0) read(tps=9298 timeouts=0 errors=0) total(tps=18757 timeouts=0 errors=0)
2020-09-14 01:34:53.651 write(tps=9317 timeouts=0 errors=0) read(tps=9293 timeouts=0 errors=0) total(tps=18610 timeouts=0 errors=0)
2020-09-14 01:34:54.651 write(tps=9302 timeouts=0 errors=0) read(tps=9207 timeouts=0 errors=0) total(tps=18509 timeouts=0 errors=0)
2020-09-14 01:34:55.652 write(tps=8959 timeouts=0 errors=0) read(tps=8963 timeouts=0 errors=0) total(tps=17922 timeouts=0 errors=0)
2020-09-14 01:34:56.652 write(tps=9737 timeouts=0 errors=0) read(tps=9678 timeouts=0 errors=0) total(tps=19415 timeouts=0 errors=0)
2020-09-14 01:34:57.652 write(tps=9780 timeouts=0 errors=0) read(tps=10025 timeouts=0 errors=0) total(tps=19805 timeouts=0 errors=0)
2020-09-14 01:34:58.653 write(tps=9759 timeouts=0 errors=0) read(tps=9669 timeouts=0 errors=0) total(tps=19428 timeouts=0 errors=0)

Explanation

This is expected behaviour and the Aerospike smart client is functioning as expected. The output messages above are printed every 1 second and so the phenomenon at work is hard to spot. If granularity were greater then the observed effect would be that the throughput drops gradually to 0 and then, after around 30 seconds, gradually increases again.

The reason for the drop is because the client threads become stuck trying to execute transactions against a node that is no longer there. When using IPtables to drop the routing, no reset packet is sent to the client, so when the node goes away the client is not aware and sits, waiting to time out the socket. This is the same when the node shuts down.

The client has a fixed 256 threads with which to execute transactions. Those threads with transactions for the removed node are stuck, waiting for the socket to time out. Threads working against other nodes complete quickly but soon pick up transactions for the missing node and also get stuck, waiting for the socket to time out.

The tend thread will notice that the node is not available, however, changes in partition ownership will not be sent until the node has been ejected from the cluster plus a small amount of time known as the quantum interval (usually a couple of seconds plus the time it takes for the tend thread to build a new partition map). With a single client pushing hard this is quite enough time for all 256 threads to pick up transactions for the missing node and get stuck waiting to time out. As the default socket timeout is 30 seconds, the threads will sit for 30 seconds until timing out the transactions. This is why the throughput starts to increase after 30 seconds until it returns to a normal level.

This is illustrated below.

This is the node leaving, the throughput drops and tend threads fail. Note the gradual drop as the threads are steadily consumed with transactions that cannot complete.

2020-09-14 01:34:15.640 write(tps=10172 timeouts=0 errors=0) read(tps=10152 timeouts=0 errors=0) total(tps=20324 timeouts=0 errors=0)
2020-09-14 01:34:16.641 write(tps=6269 timeouts=0 errors=0) read(tps=6396 timeouts=0 errors=0) total(tps=12665 timeouts=0 errors=0)
2020-09-14 01:34:16.920 WARN Thread tend Node A1 10.0.0.1 3000 refresh failed: Error -8: java.net.SocketException: Connection reset
2020-09-14 01:34:17.641 write(tps=56 timeouts=6 errors=20) read(tps=63 timeouts=0 errors=20) total(tps=119 timeouts=6 errors=40)
2020-09-14 01:34:17.924 WARN Thread tend Node A1 10.0.0.1 3000 refresh failed: Error -8: java.net.ConnectException: Connection refused (Connection refused)
2020-09-14 01:34:18.641 write(tps=596 timeouts=25 errors=116) read(tps=577 timeouts=0 errors=108) total(tps=1173 timeouts=25 errors=224)
2020-09-14 01:34:19.642 write(tps=37 timeouts=19 errors=0) read(tps=53 timeouts=0 errors=0) total(tps=90 timeouts=19 errors=0)

The throughput stays at 0 as all 256 threads sit unable to complete and waiting to time out.

2020-09-14 01:34:20.642 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:21.642 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:22.643 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:23.643 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)

...

2020-09-14 01:34:44.649 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:45.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:46.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
2020-09-14 01:34:47.650 write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)

Threads begin to time out and traffic gradually picks back up to similar throughput.

2020-09-14 01:34:48.650 write(tps=40 timeouts=0 errors=1) read(tps=29 timeouts=0 errors=0) total(tps=69 timeouts=0 errors=1)
2020-09-14 01:34:49.650 write(tps=6884 timeouts=0 errors=11) read(tps=6826 timeouts=0 errors=0) total(tps=13710 timeouts=0 errors=11)
2020-09-14 01:34:50.651 write(tps=9339 timeouts=0 errors=0) read(tps=9315 timeouts=0 errors=0) total(tps=18654 timeouts=0 errors=0)
2020-09-14 01:34:51.651 write(tps=9330 timeouts=0 errors=0) read(tps=9126 timeouts=0 errors=0) total(tps=18456 timeouts=0 errors=0)
2020-09-14 01:34:52.651 write(tps=9459 timeouts=0 errors=0) read(tps=9298 timeouts=0 errors=0) total(tps=18757 timeouts=0 errors=0)

Solution

The solution is simply to tune the socketIdleTimeout for the given environment and use case. These parameters are introduced, specifically, to allow customers to define how to best balance their workload and user expectations with infrastructure capabilities. In this example, if socketIdleTimeout had been configured to a much smaller value the threads would time out more quickly and the chances of a hiatus in transactions as described here is less likely.

Notes

  • This article is an excellent primer on understanding how timeouts and retries work together.
  • Quantum interval is discussed at length in this article.
  • Socket timeout for the Java client is documented here.

Keywords

CLIENT 30 SECONDS DROP THROUGHPUT NODE REMOVAL IPTABLES

Timestamp

September 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.