I am desperately trying to form a cluster with Aerospike 3.5.14 Community Edition. I have two LXC containers running on two different hosts. Aerospike is running on the LXC container. The containers are configured with a bridge and routing between the hosts and the containers are done using iptables nat rules.
aerospike1:
container address : 192.168.11.12 (eth0); the address inside the container
host address for the container : 192.168.16.131
aerospike2:
container address 192.168.11.12 (eth0); the address inside the container
host address for the container : 192.168.16.132
Here is the aerospike network configuration for aerospike1 :
network {
service {
address any
port 3000
access-address 192.168.16.131 virtual
network-interface-name eth0
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 192.168.16.132 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
and here is the aerospike configuration for the second node :
network {
service {
address any
port 3000
access-address 192.168.16.132 virtual
network-interface-name eth0
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 192.168.16.131 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
The cluster does not form and I have the following errors in the logs:
Jun 29 2015 11:50:42 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb97379253e1600 and self bb97279253e1600
Jun 29 2015 11:50:42 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes: dun:nodes=bb97379253e1600
Jun 29 2015 11:50:42 GMT: INFO (hb): (hb.c::2319) HB node bb97379253e1600 in different cluster - succession lists don’t match
yes the two containers have the same IP address, but they are isolated because inside the containers. Other containers from another hosts does not known and never see these addresses. They are “nat’ed” when packets going out the host.
I have added the interface-address in the heartbeat section as advised. It seems the other node is detected.
Jun 29 2015 14:05:21 GMT: INFO (paxos): (paxos.c::3220) ... other node(s) detected - node will operate in a multi-node cluster
But I always have the “cluster intergrity fault”. I have “ClusterSize 1” and I only have 1 node on the “amc” interface.
Jun 29 2015 14:11:53 GMT: INFO (info): (thr_info.c::4804) migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
Does exist a command to see the nodes in the cluster ?
My cluster is finally up and running. It seems there is something strange with the “virtual access address”.
Here below is a view of my setup using LXC container. Containers are running above the “host”. They are connected with the outside word through NAT rules on the “lxcbr0” interface. The communication between aerospike servers is done using the 192.168.16.0/24 network (private network between servers).
network {
service {
address any
port 3000
access-address 192.168.16.131
network-interface-name eth0:1
}
heartbeat {
mode mesh
address any
port 3002 # Heartbeat port for this node.
interface-address 192.168.16.131
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 192.168.16.132 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
The setup using this configuration is working. But I must not use the “virtual” keyword for the “access-address” in the service section. And I must create a virtual address eth0:1 in the aerospike container with the same ip configured on the host for the communication on the private subnet between servers ! Without this virtual interface, I must set the “virtual” keyword but the cluster is not running properly. Note that when I set the “virtual” keyword I’m not able to connect to node localhost:3000 on the AMC.
Their is something I do not understand in the setup using a virtual access address. Or their is something strange with this. Could somebody explain me what I’m doing wrong ?
thanks for your answer and the information. Note that I do not want aerospike to be completely isolated. Aerospike is running in a container and is therefore isolated if I do not setup any NAT rules on the host system. I would like to reach the aerospike cluster from trusted networks. This is handled with nat and firewall rules on the host system.
My setup is working fine now. The only special/strange things I had to do is to not use the “virtual” keyword and to set in the container the same IP that is set on host to reach aerospike from outside.