Cluster running Aerospike on LXC

I am desperately trying to form a cluster with Aerospike 3.5.14 Community Edition. I have two LXC containers running on two different hosts. Aerospike is running on the LXC container. The containers are configured with a bridge and routing between the hosts and the containers are done using iptables nat rules.

aerospike1:

  • container address : 192.168.11.12 (eth0); the address inside the container
  • host address for the container : 192.168.16.131

aerospike2:

  • container address 192.168.11.12 (eth0); the address inside the container
  • host address for the container : 192.168.16.132

Here is the aerospike network configuration for aerospike1 :

network {
    service {
	address any
	port 3000
	access-address 192.168.16.131 virtual
	network-interface-name eth0
    }

    heartbeat {
	mode mesh
	port 3002 # Heartbeat port for this node.

	# List one or more other nodes, one ip-address & port per line:
	mesh-seed-address-port 192.168.16.132 3002
	interval 250
	timeout 10
    }

    fabric {
	port 3001
    }

    info {
	port 3003
    }
}

and here is the aerospike configuration for the second node :

network {
    service {
	address any
	port 3000
	access-address 192.168.16.132 virtual
	network-interface-name eth0
    }

    heartbeat {
	mode mesh
	port 3002 # Heartbeat port for this node.

	# List one or more other nodes, one ip-address & port per line:
	mesh-seed-address-port 192.168.16.131 3002
	interval 250
	timeout 10
    }

    fabric {
	port 3001
    }

    info {
	port 3003
    }
}

The cluster does not form and I have the following errors in the logs:

Jun 29 2015 11:50:42 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb97379253e1600 and self bb97279253e1600 Jun 29 2015 11:50:42 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes: dun:nodes=bb97379253e1600 Jun 29 2015 11:50:42 GMT: INFO (hb): (hb.c::2319) HB node bb97379253e1600 in different cluster - succession lists don’t match

I do not understand what is false in my setup.

Best regards, Christpohe Burki

Hi Christophe, You have 2 containers with same address eth0.

You could configure Aerospike in the following manner so that the nodes knows which IP address to use for intra-cluster communication.

Within heartbeat stanza add a line that declares the internal cluster communication IP heartbeat { interface-address 192.168.16.131 # for first node and .132 for second node } This part is explained at https://www.aerospike.com/docs/operations/configure/network/general/

This should help you get the cluster where 2 nodes could talk to each other on the correct IP/port.

-samir

Hi Samir,

yes the two containers have the same IP address, but they are isolated because inside the containers. Other containers from another hosts does not known and never see these addresses. They are “nat’ed” when packets going out the host.

I have added the interface-address in the heartbeat section as advised. It seems the other node is detected.

Jun 29 2015 14:05:21 GMT: INFO (paxos): (paxos.c::3220) ... other node(s) detected - node will operate in a multi-node cluster

But I always have the “cluster intergrity fault”. I have “ClusterSize 1” and I only have 1 node on the “amc” interface.

Jun 29 2015 14:11:53 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0

Does exist a command to see the nodes in the cluster ?

My cluster is finally up and running. It seems there is something strange with the “virtual access address”.

Here below is a view of my setup using LXC container. Containers are running above the “host”. They are connected with the outside word through NAT rules on the “lxcbr0” interface. The communication between aerospike servers is done using the 192.168.16.0/24 network (private network between servers).

+----------------------------------------+
| aerospike   | eth0 192.168.11.12/24    |
| (container) | eth0:1 192.168.16.131/32 |
+----------------------------------------+
| host        | lxcbr0 192.168.11.1      |
|             | eth0 public_ip_address   |
|             | eth1 192.168.16.131/24   |
+----------------------------------------+

Here is my aerospike network configuration.

network {
    service {
            address any
            port 3000
            access-address 192.168.16.131
            network-interface-name eth0:1
    }

    heartbeat {
            mode mesh
            address any
            port 3002 # Heartbeat port for this node.                                                                                                     
            interface-address 192.168.16.131

            # List one or more other nodes, one ip-address & port per line:                                                                               
            mesh-seed-address-port 192.168.16.132 3002

            interval 250
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }
}

The setup using this configuration is working. But I must not use the “virtual” keyword for the “access-address” in the service section. And I must create a virtual address eth0:1 in the aerospike container with the same ip configured on the host for the communication on the private subnet between servers ! Without this virtual interface, I must set the “virtual” keyword but the cluster is not running properly. Note that when I set the “virtual” keyword I’m not able to connect to node localhost:3000 on the AMC.

Their is something I do not understand in the setup using a virtual access address. Or their is something strange with this. Could somebody explain me what I’m doing wrong ?

Hi Christophe,

Glad to see your cluster is up and running. Here is some useful information that you could review to get answers to your queries.

For connecting to AMC, you could specify the IP you have configured as access-address instead of localhost, and it should work fine.

thanks, samir

Hi Samir,

thanks for your answer and the information. Note that I do not want aerospike to be completely isolated. Aerospike is running in a container and is therefore isolated if I do not setup any NAT rules on the host system. I would like to reach the aerospike cluster from trusted networks. This is handled with nat and firewall rules on the host system.

My setup is working fine now. The only special/strange things I had to do is to not use the “virtual” keyword and to set in the container the same IP that is set on host to reach aerospike from outside.

Best regards, Christophe