I was just trying this today on a smaller in-memory only cluster today. I just set up asd 3.6.4 on a 2 node (4 cores each) cluster, in a placement group, in VPC, single AZ. The cluster I am migrating from is a 4 node (2 cores each) in 2 AZs. As soon as I switched the traffic over the ping times went crazy, in the 50ms range. I saw some warnings about running out of fd-s so i increased the number but that didn’t change anything. I also tried to bounce the nodes, one by one, and then together, but at that point they couldn’t form a stable cluster anymore. I ended up switching back to the old cluster. CPU usage was light (30%), load average wasn’t high either - around 2, network i/o was average.
Sounds like there’s something else at play, not just vpc/networking. Are there any special/recommended kernel settings? Any max number of connection settings that might affect us other than proto-fd-max (I just increased it to 50k)? We’re running on Ubuntu 14.04 with 3.13.0-36-generic
P.S. I installed a new cluster with the same hardware setup as before (4 2-core machines on 2 AZs), but all inside VPC and now I am back to normal. Having this work on more boxes is making me think that we’re hitting some sort of a per-box limit.