General questions on rolling restart


General questions on Rolling Restart


From time to time it may be required to do a rolling restart of the cluster in order to upgrade the Kernel or the Aerospike server version, or make non-dynamic configuration changes. A rolling restart is defined as consecutive restarts of individual nodes such that only one node is down at any one time and with the aim, that at the end of the process all nodes have been restarted.

Each time a node leaves or joins the cluster, the data will rebalance (refer to migrations for details).

As part of the new cluster protocol introduced in version 3.13 (refer to 3.13 upgrade guide for details on the upgrade procedure), partial partitions are kept in the cluster until migrations for the relevant partition is complete (the partition has as many full copies available in the cluster as the replication factor dictates). It is therefore not necessary to wait for migrations to complete between each node restart (unless if there is a namespace where the data is not persisted – storage-engine memory). Wait for a node to have rejoined the cluster prior to taking the next node down. One way to check if a node has joined the cluster back is to check the cluster size across all nodes. For example:

asadm>show statistics like cluster_size -flip
~Service Statistics (2018-01-01 10:00:00 UTC)~
            NODE   cluster_size              7              7              7              7              7              7              7
Number of rows: 7

Application side considerations, when performing a rolling restart

  • Potential application timeouts: Every time a node is stopped, some timeouts may occur on the client library side due to transactions that were in flight and didn’t reach the node before it was taken down. Depending on the client side policy (timeout and retries) such issue may or may not be noticed on the application side.

  • Added latency due to duplicate resolution: Adding and removing nodes in a cluster causes migrations to kick in to re-balance the cluster, which will potentially introduce multiple versions of a record depending on the write workload. By default, Aerospike will duplicate resolve across the different versions for write transactions to ensure the best version is used. For some use cases, it may be acceptable to disable the duplicate resolution using the disable-write-dup-res configuration parameter. For read transactions, the client’s policy drives whether to duplicate resolve or not and the server can enforce the behavior through the read-consistency-level-override configuration parameter.

  • Potential increased load caused by migrations: Migrations will cause data to be exchanged between nodes which could, based on the overall workload impact network and/or disk io performance. By default, migration settings are such as the impact should be negligible. Refer to the Tuning Migrations documentation for further details.

  • Memory consideration: Proceeding with a rolling restart without waiting for migrations to complete between each node could temporarily increase the memory consumption across the nodes in the cluster. As a node is taken down, remaining nodes will start receiving records for partitions previously owned by that node which will be held on until the migration for such partitions is completed. If another node is taken down prior to the completion of the migration, further partitions can start populating, etc… this could, in some cases, create a non negligible temporary increase in the total number of records owned by each node (hence memory and disk), which will eventually subside when migrations complete at the end of the rolling restart procedure.

Versions running the older cluster protocol (prior to 3.13)

Depending on the nature of the write workload, refer to the following two steps in order to mitigate the risk of losing updates to records during migrations:

  1. Stop all write traffic to the cluster until the rolling restart is complete (last node rejoined the cluster). This guarantees updates will not be lost during migration, as long as write duplicate resolution has not been disabled. If write load remains while nodes are being restarted, it is possible records may not be available until a node has re-joined the cluster, and some updates may be lost during conflict resolution.

  2. During a rolling restart, wait for migrations to complete before taking the next node out of the cluster. Setting disable-write-dup-res to true will disable write duplicate resolution which can improve write performance during migrations but may also result in lost updates.