Prevent clustering between installations

DickV · October 21, 2016, 7:37pm

How do you prevent one Aerospike installation from communicating with and clustering with and interfering with another? I had a working Aerospike installation on our corporate network running a YCSB test, which failed as soon as Aerospike was installed on another system on the same network. They formed a cluster and statistics showed cluster-size set to 2. Setting “paxos-max-cluster-size 1” in the .conf didn’t help. Both were installed and configured as memory only, which I thought would have been one system only. I can’t make either work when both are turned on. I have to shut one off and reboot the other to regain function.

kporter · October 21, 2016, 8:09pm

Nodes discover each other via heartbeat, running severs on different hb port will prevent them from merging.

Also in the latest version there is a cluster-name parameter. This field prevents nodes from joining clusters by a different name.

DickV · October 21, 2016, 8:35pm

Thanks Kevin. I understand that most users want auto-clustering. An additional question is How can I prevent it happening during install of the next node?

kporter · October 21, 2016, 9:08pm

Configure the node to use a different heartbeat port and/or cluster name.

DickV · October 22, 2016, 12:02am

Since I’ve not changed either of those before, will that make it difficult for YCSB to function?

DickV · October 24, 2016, 12:17pm

Hi Kevin, thanks for the help! I’ve looked through the Aerospike doc’s and do not see any info related to disabling automatic clustering. I’m surprised given the negative affect it had on my installations. I had a system running a YCSB test and it failed immediately after the other Aerospike began operating. Every read returned errors. On top of that, I could no longer control the first installation. Restarting the Aerospike service had NO effect. Other asinfo and AQL commands also did not work. Aerospike became totally unusable. This would seem to be a BAD thing for any new user. What if someone else on their network was also experimenting with Aerospike? Each would experience sudden and unexplained failures.

I have seen the information on heartbeat, but that is either multicast and unicast, and it is not clear which would be better to use when changing the heartbeat port. Did you mean to change only the 3000, and not 3001-3003 also? What about 9918? Are there a negative affects when changing the heartbeat? Would all the utilities work? asinfo, aql, etc.

Cluster Name sounds like a good choice for me, but where or how is it set?

-Dick

DickV · October 24, 2016, 2:11pm

Hi Kevin, I found cluster-name in the doc’s and set it, but it does NOT keep them from talking and failing. After setting cluster-name on both, when I run aql on one system and type “asinfo cluster-name”, it shows both. And they are affecting each other’s operation, negatively.

How do I keep separate nodes from talking to each other? Yet allow all the utilities to work? Which ports to set? And every install has to choose a different range. -Dick

DickV · October 24, 2016, 2:40pm

When I did a “systemctl restart aerospike” it reset both systems!!! Even though they have different cluster-names.

DickV · October 24, 2016, 4:05pm

OK, so I have to change the heartbeat port, which by default is multicast 9918. Can I change it to any number in the range 0-65535?

kporter · October 24, 2016, 4:29pm

Yes, though you should avoid Linux’s privileged port range.

kporter · October 24, 2016, 4:31pm

Are both nodes hosted on the same machine?

If not were any warning at the end of the other node’s logs?

kporter · October 24, 2016, 4:45pm

Could you describe how they are “affecting each other’s operation, negatively”?
Did you set it dynamically or statically (in config).
1. If statically and you restarted the node then the tools should only report one or the other.
2. If dynamically then there are additional steps to prevent the tools from discovering them. The tools look at the alumni list (essentially a historical record of cluster members developed during runtime). This is so you can inspect a split brain cluster or see that nodes are down.

DickV · October 24, 2016, 4:59pm

They are on separate machines. I set the cluster-name in each’s .conf and restarted each service. Then I had a YCSB test running on one and did a restart on the other and kaboom-both were cleared. I guess logging is not configured by default because there isn’t any /var/log/aerospike…

DickV · October 24, 2016, 5:01pm

Anyway, I have changed the multicast port and that seems to work…so far. But I’m wondering about the multicast traffic and if I should switch to unicast (mesh)? And if yes, then how to configure that?

kporter · October 24, 2016, 5:05pm

That is unexpected.

Logging is enabled by default, but accessing the logs has changed on platforms using systemd. If you are using systemd then by default you will need to access the logs via journalctl. You can also add file based logging.

We prefer multicast for operations ease, but on most (all?) cloud providers, multicast isn’t an option, also we have seen some network hardware act poorly with multicast. See here for mesh configuration.

DickV · October 24, 2016, 5:08pm

I meant log files are not configured. Yes, it logs to stderr, the console, but what happens is errors start pouring out

kporter · October 24, 2016, 5:09pm

Could you elaborate?

DickV · October 24, 2016, 5:14pm

On the system running the YCSB test which is reading all of the 112M records, it suddenly starts scrolling read errors for non-existent keys. Because it’s db has been emptied.

DickV · October 24, 2016, 5:15pm

So, cluster-name doesn’t keep them from talking to each other. But heartbeat port seems to.

DickV · October 24, 2016, 5:30pm

BTW, does the heartbeat port in the aerospike.conf file have to be different on each system or just not be 9918 for multicast or not 3002 for mesh?

Topic		Replies	Views
Aerospike not clustering using multicast or mesh Installation	16	4314	November 21, 2017
How to configure 2 nodes in a cluster so they communicate with each other Configuration	5	2289	November 16, 2015
How to Build Cluster Installation	7	1687	April 3, 2018
Aerospike Cluster Automatically Errors Node.js Client	3	3755	January 18, 2016
Adding 2 nodes to the cluster and how to check whether 2 nodes are connected or not?	36	8408	March 10, 2017

Prevent clustering between installations

Related topics