Asmonitor, asadm, and/or AMC are reporting “cluster visibility” on one or more nodes to be
false. This occurs when the
list of nodes seen by the tools contains addresses not in the list of addresses returned by
asinfo -v services on that particular node or if the
services list across the cluster does not match for the nodes. This is not necessarily an indication of a cluster not fully operational, it is nevertheless recommended to look at the cause and address it.
Starting with the version of tools shipped with Aerospike version 3.7.5, asadm (0.0.17) will only show “cluster integrity” instead of “cluster visibility”. However, if there are cluster visibility issues (mismatch of services lists across the nodes in the cluster), asadm will print a warning when launched.
If a minor cache issue in monitoring, try exiting out of the tool and re-attempt to verify if it’s consistently false.
The following 2 commands should help identify the cause of the cluster visibility false:
asadm -e "asinfo -v service"
This command will return the broadcasted service addresse(s) for each of the node in the cluster.
asadm -e "asinfo -v services"
This command will return the list of neigbors addresses each node is reporting.
Let’s now look at the different causes for a cluster visibility false.
Some nodes may be reporting more than one
If any of the nodes are reporting multiple service addresses then cluster visibility will be false because the cluster visibility indicator in the tools does not support this configuration. This should also cause all nodes represented by the tool to report false. Many clients also do not support this configuration and so it may indicate that you have a misconfigured server, in which case you will need to configure access-address in the network.service context of
Sub-set or extra IP’s in the services list
One or more nodes in the cluster may be advertising a subset of the peer node’s access-addresses or the nodes are advertising the
access-address of a node that has departed from the cluster. Currently there isn’t an easy way to verify that this is the case and which nodes are missing/present that shouldn’t be. But, if nodes are leaving the cluster or if a node has recently joined, it could potentially cause such issues.
You could run the following command and match the list of IP’s on all the nodes and confirm if they have a mismatch in the count of values:
asadm -e "asinfo -v services"
For 3.7 or later release, should set auto-reset-master for the paxos-recovery-policy if the value is “manual”.
With the list returned there may be a few lists that are shorter (or longer) than the rest, you could then identify the node causing the problem. In general if most nodes report all of their peers and the one reporting false are only missing a small fraction of the peers, no action is required, the clients are able to work around this issue and this can be treated as a
false negative. If there are a large number of peers missing from the services list then you can try either dun/undun the missing node on the nodes reporting false (option 1) or dun/undun the node reporting incorrect values across the cluster (option 2).
Note : Doing a dun-undun on the cluster will trigger migrations as the cluster rebalances.
# Option 1: asadm -e "cluster dun [Missing IP] with [IP returning false visibility]" # Option 2: asadm -e "cluster dun [IP returning false visibility]" # Then run the following command after waiting for a few seconds: asadm -e "cluster undun [Missing IP / IP returning false visibility]"