Reducing "memory-size" by more than 50%


#1

According to the configuration reference, memory-size “Cannot be reduced by more than 50% of previously set value.” Can I reduce it more than 50% as long I do it in stages? Or is this an architectural limit?

Motivation:

I’m trying to convert a 12-machine cluster of r3.4xlarge machines (122GB RAM, 10% memory usage) to a similarly sized cluster of i3.large machines (15GB RAM, expected 80% memory usage). I’d like to add the new machines to the existing cluster, and then spin down the old ones, but I’ll have to reduce memory-size first.

So the plan would be:

  1. Increase memory HWM to 80%
  2. Reduce memory-size, in stages, to 11-12GB. [Will this be possible?]
  3. Add 13 new i3.large nodes
  4. Remove r3.4xlarge nodes one-by-one, waiting for migrations to complete between each

Some considerations that have led to the above plan:

  1. i3.large instances are far cheaper than r3.4xlarge.
  2. Any downtime at all on this cluster is extremely expensive
  3. We don’t have the Enterprise package, so a backup + restore + XDR is not an option
  4. Simple backup + restore involves too much downtime

#2

You can do it in stages. I’ve done it plenty of times without issue. I think they just put the limitation on there to prevent accidental typo catastrophes


#3

I don’t think you have to do that. Nodes don’t have to be unanimous on memory-size or even storage-engine.

Note: Running at 80% memory usage is not recommended. 60% is typical high water mark for memory. Run well below it.

You can just add your new nodes to the cluster with the same namespace defined identically for the all the unanimous parameters but change the dynamic parameters like hwm and memory to whatever new value you want on your 15GB instance. Let the the 15 GB node finish migrations and then take one 122 GB node out. Let the data rebalance fully. Makes sure your 15GB node is happy - not getting into evictions and the like. Do this one by one. If need be horizontally scale your cluster with one or more additional 15GB nodes.

hwm, storage-engine and memory-size paramaters are not unanimous - so each node in your cluster for that namespace can specify different memory size or even different storage engine with different hwm. This is to allow easy migrations such as the one you are planning.

Suggestion: Before trying anything, make a backup of your data using the backup utility.


#4

That is correct.


#5

Can you provide more information about this? We’ve been running a 10TB cluster at 80% HWM for a year now without problems. 60% would cost us an extra $100K/yr. That’s not an expense that I’d like to incur without knowing exactly why.


#6

What I was able to find on the memory HWM was that it’s a recommended setting for small clusters. That is, if I have a 3 machine cluster, and lose 1 box, I expect memory usage to increase by 50% on the other two.

For large clusters, 60% is overly conservative, if that is the only logic behind it. I don’t know what other considerations there are.


#7

There is some overhead, but the main reason is to allow for extra capacity for when you lose a server.

80% * ((1/n-1)+1) where n is the number of nodes will show you what 80% will become on a single node loss. Substituting n as 5 to show what 80% usage might become for a single node loss… It’s 100%. For an 11 cluster it’s 88%. And so on. 60% gives you a lot of wiggle room to where you have enough space to do things dynamically and enough capacity not to overwhelm your cluster in the event of an issue. I only ever expect to allow a single node loss in my 6 node cluster so at 70% usage we may go up to 84%. We have redundant clusters though so only allowing a single loss is acceptable to us.


#8

Good. That was my understanding as well. Some things that make this safe (for our cluster):

  1. The final cluster will have more than 13 nodes (we’re choosing smaller instance sizes in order to have more nodes and therefore a higher HWM).
  2. Migrations do not happen instantly, and in fact take many hours on this cluster. You don’t reach the above numbers until migrations are complete. So we mitigate risk by maintaining a reasonable SLA on hardware replacement.
  3. We can live with evictions. Our data is worth dramatically more money at the beginning of its lifespan than at the end.