Why is quiescing a node apparently not working?

FAQ - Why is quiesce a node not working as expected?


When following the standard quiescing operating procedure, the client still get exceptions when the node is shut down, defeating the purpose of quiescence.


One potential cause for such situation is when the client is not able to properly refresh its partition map. This would cause the existing partition map to be used and the quiesced node to be accessed, causing proxies and causing errors when the quiesced node is shut down. It is therefore important to check the network and/or potential issues with older client libraries.

Check the proxy_in_progress and client_proxy_complete statistics for evidence of proxies going on prior to shutting down a quiesced node.

There is another situation where a quiesced node would be hit: if the cluster is using rack-aware and the client is using the PREFER_RACK ReplicaPolicy. In such cases, against older server versions, the client will continue hitting the quiesced node which would then proxy to the right node, causing errors and connection exception when it is shutdown.

Version addresses this issue:

[AER-6078] - (BALANCE) Working masters (also) report ownership of appropriate replicas in client partition map (e.g. to optimize rack-aware client reads in certain situations).


In addition, there are a couple of issues related to quiescing causing redundant migrations:

[AER-6012] - (MIGRATION) For AP namespaces, there may be redundant migrations when quiescing multiple nodes at once and later shutting them down one by one. Fixed in version

[AER-6035] - (BALANCE) For AP namespaces with `prefer-uniform-balance` true, there may be redundant migrations after shutting down a quiesced node. Fixed in version




August 2019

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.