Can you upgrade to 5.0+ by skipping the 4.9 jump version?

FAQ – Can you upgrade to 5.0+ by skipping the 4.9 jump version?

Detail

Aeropike requires server nodes to be upgraded to version 4.9.0 prior to upgrading to 5.0 and above. This article covers the impact of skipping the 4.9 upgrade step and workarounds if a direct upgrade to 5.0 is a requirement if it fits your use-case.

Note: It is always recommended to use the latest version of a lineage to take advantage of the latest hotfix patch available for a given version. For 4.9.0, the latest version is 4.9.0.17 at the time of writing of this article.

Answer

Skipping the 4.9 upgrade when upgrading Aerospike in a rolling restart is not supported.

If you directly upgrade to 5.0 from a pre-4.9 version, you will get a cluster size of 0, and nodes will be unable to re-join the original cluster:

Sep 21 2020 21:12:02 GMT: INFO (clustering): (clustering.c:5988) sent cluster join request to bb9030011ac4202
Sep 21 2020 21:12:02 GMT: INFO (hb): (hb.c:5740) removed mesh seed host:172.0.0.1 port 3002
Sep 21 2020 21:12:02 GMT: INFO (hb): (hb.c:8648) node arrived bb9020011ac4202
Sep 21 2020 21:12:02 GMT: INFO (fabric): (fabric.c:2489) fabric: node bb9020011ac4202 arrived
Sep 21 2020 21:12:03 GMT: INFO (clustering): (clustering.c:5794) applied new cluster key 3e45d11ad1cd
Sep 21 2020 21:12:03 GMT: INFO (clustering): (clustering.c:5796) applied new succession list bb9040011ac4202 bb9030011ac4202 bb9020011ac4202
Sep 21 2020 21:12:03 GMT: INFO (clustering): (clustering.c:5798) applied cluster size 3
Sep 21 2020 21:12:03 GMT: INFO (exchange): (exchange.c:2319) data exchange started with cluster key 3e45d11ad1cd
Sep 21 2020 21:12:03 GMT: WARNING (exchange): (exchange.c:2654) abandoned exchange - 5.0+ - can't cluster with pre-4.9 nodes
Sep 21 2020 21:12:11 GMT: INFO (info): (ticker.c:167) NODE-ID bb9040011ac4202 CLUSTER-SIZE 0

For a rolling upgrade, it is necessary for a cluster to have all its node running version 4.9 prior to upgrading to version 5.0.

If shutting down the whole cluster is permitted on production, it is also supported to stop all the nodes in the cluster and upgrade them directly from version 4.8 to 5.0. For upgrades from previous versions, refer to the Special Upgrades page for details.

Subtle difference when skipping version 4.9 on clusters being both destination and source but not leveraging the forward feature

One difference between an upgrade through 4.9 (rolling or not) versus an upgrade from 4.8 to 5.0 (requiring all nodes to be shut down at the same time), is the flag that identifies records written by an XDR clients as opposed to records written by regular (non XDR) clients. This is called the xdr-write bit and is present in the client wire protocol. This bit allows XDR to decide which records need to processed when it comes to forwarding to another destination (when XDR is configured on a destination cluster – a cluster receiving writes from another cluster). When it comes to persisting this information on the cluster:

  • In versions 4.8 and earlier, the digestlog holds the digest for records that need to be processed, and the fabric protocol between cluster nodes also carries this flag for all replicas to have this information. This bit is not stored along with the records.

  • In versions 5.0 and above, this xdr-write bit is part of the record itself and is persisted along with any other record specific metadata.

  • Version 4.9 implements both and carries this information through the fabric protocol as well as in the record’s flat format.

Therefore, when upgrading straight from 4.8 to 5.0 after shutting down the whole cluster, leveraging the rewind feature may ship some records which were written by an XDR client even when the forward configuration is not turned on. Indeed, when starting up a cluster on version 5.0 for the first time, an ‘initial’ timestamp is stored in the system meta data (smd) folder to keep track of when the records would be expected to start having the xdr-write bit stored. The rewind features, when not specifying a ‘last update time’ (LUT) to rewind from, will go back 3 seconds prior to the timestamp in the smd file and would therefore potentially ‘forward’ records that were written by an XDR client even when the forwarding feature is turned off, as records could have been written by the 4.8 versions prior to the upgrade and would not have the xdr-write bit set. Running version 4.9 for a few seconds prevents such edge situation from happening.

Notes

  • Refer to the 5.0 Upgrade page for the full instruction to upgrade to 5.0.

Keywords

UPGRADE 5.0 PRE 4.9 CLUSTERING

Timestamp

November 2020

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.