We have two different clusters running Aerospike CE 3.5.9. The first cluster is a 3 nodes one and the second is a 4 nodes. They both run on GCE. Each configuration is using only 6 cpus on the 8 available and local-SSDs.
We have lots of local-ssd problems on GCE and lots of network problems too.
What I’ve seen needs confirmation, but each time one of the nodes is excluded of the cluster and runs in “standalone”, if this is the principal ** node, when it comes back in the cluster all other nodes begin to become unstable during migration and at the end we need each time to restart these nodes.
If the failing node is not the principal, there’s no problem, we just restart it and it come back in the cluster.
Do you have any idea ?
** Note from @Mnemaudsyne: by ‘principal’, the user means ‘main’.