Namespace not clean up properly after migration


#1

Hi Problem occur after we add new clean node to cluster.

When replication done, we get different replication and master objects count. Namespace is not balanced. The number of master objects continues to grow

We get identical problem on earler releases, from 3.6.1.

Our solution earlier is to run ‘cluster dun all’ and ‘cluster undun all’ new migration procces starts and problem disapeaerd after migration ended.

But in last version we can`t run these command.

https://discuss.aerospike.com/t/asadm-cluster-dun-all-invalid-command-or-could-not-connect-to-node/3775

aerospike-server-community 3.11.0.2-1 aerospike-tools 3.11.0


#2

Dun was removed in 3.9.1 because it was no longer necessary due to the enhanced paxos algorithm. The cluster should auto-heal and auto-rebalance.

A few questions:

  • Are all you nodes on the same version? (3.11.0.2)
  • What is your paxos-recovery-policy set to? It should be auto-reset-master, which is the default in the version you’re using.
  • If you run asadm -e info, do all nodes agree on on the size of the cluster, and have cluster visibility true?
  • Do you have any errors in your logs, particularly network errors?

I suspect you have network issues between your nodes, either because of networking issues or your Aerospike is mis-configured. Can you give out any information about deployment in terms of (a) bare metal vs cloud, (b) number of NICs in the nodes and © how those NICs are used?

I would also note that your stop-writes, high-water-mark-memory and high-water-mark-disk parameters are set oddly. These are typically 90%, 60% and 50% respectively, yours are 80%, 80%, 99%. There are serious ramifications of mis-configuring these, be aware of what these ramifications are before they bite you in production.