From the cluster paged, It is said partitions need to be moved to other nodes when a node fails or new node added. In this case, does it means all data will be copied from one node to another?
If so, is there a best practice for the size limitation of a namespace? (If namespace is too big, saying 4T, then each partition of the namespace is about 1G, copying these partitions across cluster might be very resource consuming)
If the node is offline just for a few minutes (due to network issue let’s say), will migration continues? Or it will detect this and undo migration smartly?
If you have 4TB of replicated data, it’s going to take a long time to sync no matter what. There’s no magic answer to this, data is data.
Aerospike will already do the best to make sure migrations finish as fast as possible without disturbing the normal read/write load and in my experience it’s handled short downtime just fine (like updating a node’s software). It is recommended to make sure the migration is completely finished though, just in case. Easiest way is to make sure all incoming and outgoing partitions for migrations are 0 (in AMC or via status check).
Will it be supported in aerospike to use a net storage to make migration without data copying, such as NFS, HDFS. Or maybe use net storage during the migration and use local storage when data copying is done. Just a wild idea…