Reboot 2 node cluster

#1

Hello.

I have 2 node cluster. What happens if I reboot both of them simultaneously ? Is it safe ? or it is better to reboot one of them, wait till synchronization and then reboot the second one.

#2

Depends on what you mean by ‘safe’. You will obviously lose availability which for most users isn’t considered ‘safe’. But if you use persistence, i.e. ‘storage-engine device’, you shouldn’t lose any data, assuming that is your concern. Also note that cold-start is much slower than the ‘Fast Start’ in Enterprise so depending on the amount of data and your situation, you could be unavailable for a significant amount of time.

1 Like
#3

Thank you for you answer. Yes My concern is about data consistency. now I understand that nothing bad should happen. Could you please clarify what about synchronization. Does the speed depend on whether I reboot 2 node simultaneously or one by one ?

#4

In Enterprise, we store the primary index in shared memory and re-attach on restart making restarts much faster. When Enterprise editions reboot or shutdown in an unsafe manner they will not be able to Fast-Start and will resort to a cold-start. In Community edition, we always cold-start since the index isn’t stored in shared memory thus always lost on restart. Cold-start is the process of rebuilding the primary index from records found in storage, this requires fully reading the storage layer which can take a significant amount of time. Restarting one by one doesn’t avoid this issue but you will be able to wait for a restart and migration completion before restarting the next node which will minimize availability. Also if you had a 3rd node, you wouldn’t need to wait for migrations to complete after a node starts since the latest data will be available between the two remaining nodes.

Also note that I have made the assumption we are discussing AP mode and not strong-consistency mode which is an Enterprise feature. With a bit of effort, you can violate consistency in AP, if you are very sensitive to consistency violations, you should consider using strong-consistency.

1 Like
#5

We can’t really give you a straight answer without more details. What version/edition of Aerospike are you using? Whats your config like? In general, though, you really dont want to reboot the entire cluster at once… in almost every situation. Now the maintenance process outside of that can differ depending on how things are setup.

1 Like
closed #6

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.