Can't connect to my cluster


#1

Hello,

I constantly have the same code 11 error when trying to connect to the cluster (only 1 node)

Nov 29 2015 05:30:15 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb969f9bd270008 and self bb92439d95b8a44
Nov 29 2015 05:30:15 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes:  dun:nodes=bb969f9bd270008
Nov 29 2015 05:30:19 GMT: WARNING (tsvc): (thr_tsvc.c::382) rejecting client transaction - initial partition balance unresolved
Nov 29 2015 05:30:20 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb969f9bd270008 and self bb92439d95b8a44
Nov 29 2015 05:30:20 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes:  dun:nodes=bb969f9bd270008

So I ran the command

asinfo -v dun:nodes=bb969f9bd270008

And now it still doen’t work, except that the log shows this instead:

[Ignoring succession list mismatch with dunned node bb969f9bd270008 in different cluster]

I don’t know what to do now. The server is online, AMC works too, I just can’t perform any action on the database…

Thanks in advance.


#2

Kicker,

Which version of Aerospike Server are you using?

It looks like you have 2 node cluster and the nodes don’t seem to form a healthy cluster, has anything in your network changed? Are you using mesh or multicast clustering mode?

can you share output for command executed from both the nodes in your cluster?

asadm

admin> info

-samir


#3

Hi,

I’m using 3.5.15. You were right, I had a second single node cluster on another machine in my network, but it wasn’t recognized has a node of the first one in AMC, so I didn’t thought it could be the source of my problem. I turned it off and now it works again.

Thanks!


#4

Hi again,

The fact that I can’t seem to put 2 single node clusters on the same network causes some problems to my workflow. The first one, on a physical dedicated machine, is supposed to be available for every one, and the other one, on a VM (with its own internal IP address) for testing purposes.

I’d like some help configuring those so that they coexist on the network without messing with each other. Each time I tried to do it myself, I ended up either with some conflict issues like the one I just had, or Aerospike just not starting at all…

Thanks in advance.


#5

Hi, i was facing same issue, when i changed the heartbeat port in /etc/aerospike/aerospike.conf it is working fine. The reason their could be other Aerospike instances running in your network. from the logs “discrepancy between node bb969f9bd270008 and self bb92439d95b8a44” i can say it is trying to sync with other nodes in network.


#6

Kicker / Ranjit,

If you are using 2 Aerospike nodes in same network using multi-cast configuration, they are expected to talk to each other. Did you try configuring mesh mode configuration where in you could specify which nodes should be part of which cluster?

Aerospike computes Node ID using MAC address and port, If you are running 2 nodes on same physical box, with same default ports, probably both the nodes are running with same node id. You should run these nodes with different ports (3000-3004 ports for one and other ports for the second node, for example, 4000-4004) this way the node ids computed would be distinct and you should be able to run multiple nodes on same physical machine or inside different VMs.

Let me know if you need further help. -samir


#7

Yes, mesh mode configuration is working well, in that case i don’t need to change heartbeat port.


#8

Working good for me as well.