Why is quiescing a node apparently not working?

FAQ - Why is quiesce a node not working as expected?

Details

When following the standard quiescing operating procedure, the client still get exceptions when the node is shut down, defeating the purpose of quiescence.

Answers

One potential cause for such situation is when the client is not able to properly refresh its partition map. This would cause the existing partition map to be used and the quiesced node to be accessed, causing proxies and causing errors when the quiesced node is shut down. It is therefore important to check the network and/or potential issues with older client libraries.

Check the proxy_in_progress and client_proxy_complete statistics for evidence of proxies going on prior to shutting down a quiesced node.

There is another situation where a quiesced node would be hit: if the cluster is using rack-aware and the client is using the PREFER_RACK ReplicaPolicy. In such cases, against older server versions, the client will continue hitting the quiesced node which would then proxy to the right node, causing errors and connection exception when it is shutdown.

Version 4.6.0.2 addresses this issue:

[AER-6078] - (BALANCE) Working masters (also) report ownership of appropriate replicas in client partition map (e.g. to optimize rack-aware client reads in certain situations).

Notes

In addition, there are a couple of issues related to quiescing causing redundant migrations:

[AER-6012] - (MIGRATION) For AP namespaces, there may be redundant migrations when quiescing multiple nodes at once and later shutting them down one by one. Fixed in version 4.3.1.11.

[AER-6035] - (BALANCE) For AP namespaces with `prefer-uniform-balance` true, there may be redundant migrations after shutting down a quiesced node. Fixed in version 4.5.2.1.

Keywords

QUIESCE MIGRATION CLIENT EXCEPTION

Timestamp

August 2019

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.