Node will not join cluster after upgrade
When a cluster is being upgraded from Aerospike 3.13 to a version greater than Aerospike 3.14 the first node upgraded starts but will not join the cluster and, instead, forms a single node cluster on its own. The messages in the aerospike.log will show as follows:
Aug 13 2019 08:40:27 GMT: WARNING (hb): (hb.c:4647) (repeated:9) unable to parse heartbeat message on fd 71 Aug 13 2019 08:40:27 GMT: WARNING (hb): (hb.c:4647) (repeated:4) unable to parse heartbeat message on fd 74 Aug 13 2019 08:40:27 GMT: WARNING (hb): (hb.c:4647) (repeated:24) unable to parse heartbeat message on fd 68
These messages will display even when there have been no changes to mesh config or routing.
The inability to parse heartbeat messages indicates that the upgraded node and the remaining nodes are using a different cluster protocol. Aerospike 3.13 brought in a change to the cluster protocol that allowed huge improvements to cluster performance however, the new protocol is not compatibale with previous versions. This change does not happen automatically as part of the Aerospike 3.13 upgrade but instead must be done post-upgrade by running a script. If, for some reason, the script has not be run and the cluster protocol has not been changed. Versions of Aerospike later than 3.14 must use the new protocol.
Old nodes will show the old cluster protocol as follows:
Admin> show config like protocol ~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~ NODE : 172.17.0.3:3000 172.17.0.4:3000 86804bde1c48:3000 heartbeat.protocol: v2 v2 v2 paxos-protocol : v3 v3 v3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~ NODE : 172.17.0.3:3000 172.17.0.4:3000 86804bde1c48:3000 heartbeat.protocol: v2 v2 v2 Admin>
The upgraded node will show:
Admin> show config like protocol ~~~~~~~~~~~~~~Service Configuration (2019-08-13 17:16:02 UTC)~~~~~~~~~~~~~~ NODE : 172.17.0.6:3000 heartbeat.protocol: v3 ~~~~~~~~~~~~~~Network Configuration (2019-08-13 17:16:02 UTC)~~~~~~~~~~~~~~ NODE : 172.17.0.6:3000 heartbeat.protocol: v3 Admin>
The heartbeat protocol is different and the upgraded node will not show
paxos-protocol as this is now deprecated.
The simplest solution is to run the cluster protocol script on the remaining nodes within the cluster and then continue with the upgrades. Steps are as follows:
- Shutdown the upgraded node.
- Allow migrations to finish in the remaining cluster.
- When migrations are finished run the script as described in the Aerospike 3.13 Special Upgrade Instructions
- When cluster protocol has been changed and checked restart the upgraded node.
- Continue with the upgrade as planned.
- All Aerospike special upgrade instructions can be found on the Special Upgrades documentation page.
- The current cluster protocol version can be checked prior to upgrade using the
asadmcommand line tool.
UNABLE TO PARSE HEARTBEAT FD PAXOS PROTOCOL UPGRADE CLUSTER