Why is quiescing a node apparently not working?

Aerospike_Knowledge · September 9, 2019, 6:10pm

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - Why is quiesce a node not working as expected?

Details

When following the standard quiescing operating procedure, the client still get exceptions when the node is shut down, defeating the purpose of quiescence.

Answers

One potential cause for such situation is when the client is not able to properly refresh its partition map. This would cause the existing partition map to be used and the quiesced node to be accessed, causing proxies and causing errors when the quiesced node is shut down. It is therefore important to check the network and/or potential issues with older client libraries.

Check the proxy_in_progress and client_proxy_complete statistics for evidence of proxies going on prior to shutting down a quiesced node.

There is another situation where a quiesced node would be hit: if the cluster is using rack-aware and the client is using the PREFER_RACK ReplicaPolicy. In such cases, against older server versions, the client will continue hitting the quiesced node which would then proxy to the right node, causing errors and connection exception when it is shutdown.

Version 4.6.0.2 addresses this issue:

[AER-6078] - (BALANCE) Working masters (also) report ownership of appropriate replicas in client partition map (e.g. to optimize rack-aware client reads in certain situations).

Notes

In addition, there are a couple of issues related to quiescing causing redundant migrations:

[AER-6012] - (MIGRATION) For AP namespaces, there may be redundant migrations when quiescing multiple nodes at once and later shutting them down one by one. Fixed in version 4.3.1.11.

[AER-6035] - (BALANCE) For AP namespaces with `prefer-uniform-balance` true, there may be redundant migrations after shutting down a quiesced node. Fixed in version 4.5.2.1.

Keywords

QUIESCE MIGRATION CLIENT EXCEPTION

Timestamp

August 2019

Topic		Replies	Views
CPU loading C# Client	7	1038	April 8, 2021
Replication issue : all nodes down when synchronizing after a node restart Configuration	9	2287	November 22, 2016
Removing a node without causing client failure How Aerospike Works	6	4217	September 23, 2016
Aerospike Exception Operations	4	1126	August 10, 2017
Aerospike behavior when node dies	5	4121	February 11, 2015