Cluster migration - mem and disk reduction


#1

We are planning to do a live rolling migration of our existing cluster. Current config consists of two namespaces. One persistent, one memory. Existing cluster: 10 i3.4xlarge instances (16cores, 122G mem) replication factor of 2, primary namespace pointed at 2 ephemeral nvme volumes at 1.7T each

Migrating to: 8 c5d.9xl (36 cores, 72G mem) single 900G ephemeral nvme storage device

Our namespace utilization is well below the smaller storage we are migrating to (see attached info namespace), so we are reducing mem and space, and upping cpu.

Two main questions are:

  1. With namespace utilization being well below the new cluster disk size, will we be okay rolling in the new hosts without interruption? Same with removing old hosts post-migration?
  2. We will need to reduce the memory-size configuration values on the new hosts. Can this be inconsistent across namespace configurations in the cluster? (e.g. host1 namespace a {memory-size 115G} new-smaller-host2(namespace a {memory-size 65G}


#2

Are you running Aerospike 3.14 or later?


#3

3.15.1.3 TIL there is also a 10 char post minimum.


#4

Post 3.14, this is all as you would expect. When taking a node down, be sure to wait for migrations before taking down the next. (A few complicated caveats prior to 3.14).

Yes this can be inconsistent i.e. it isn’t required to be unanimous across the cluster. It will be fine as long as you a below your eviction and stop-writes high water marks. If you do breach the eviction high-water-mark the nodes will start to evict and the eviction depth could become extremely skewed on some partitions. If you think this could be a risk, you could mitigate it by setting the high-water-marks on the larger nodes to the equivalent amount of memory as the high-water-marks on the smaller nodes represent e.g. say on large you have 115 G RAM and smaller nodes have 65 G RAM with hwm = 90% then solve for X where 115 * X = 65 * 90%.