Cluster integrity fault: Unable to create two node cluster


#1

I have installed aerospike on Ubuntu 14.04 virtual machine on different machines. Both virtualboxes have nat’d network enabled. With port forwarding of port Vbox TCP:3000-3003 -> TCP:3000-3003

Node 1

IP: 192.168.1.10

Vbox NAT: 10.0.2.15

Configuration file:

service {
        user root
        group root
        paxos-single-replica-limit 1
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000
}

logging {
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3000
                access-address 192.168.1.10 virtual
                network-interface-name eth0
        }

        heartbeat {
                mode mesh
                address any
                port 3002
                interface-address 192.168.1.10
                mesh-seed-address-port 192.168.1.11 3002
                interval 150
                timeout 10
        }

        fabric {
                address 10.0.2.15
                port 3001
        }

        info {
                port 3003
        }
}

namespace test {
        replication-factor 2
        memory-size 1G
        default-ttl 0

        storage-engine device {
                file /opt/aerospike/data/test.dat
                filesize 1G
                data-in-memory true
        }
}

Log file first 1000 line: http://pastebin.com/8da9avef

Node 2

IP: 192.168.1.11

Vbox NAT: 10.0.2.15

Configuration file:

service {
        user root
        group root
        paxos-single-replica-limit 1
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000
}

logging {
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3000
                access-address 192.168.1.11 virtual
                network-interface-name eth0
        }

        heartbeat {
                mode mesh
                address any
                port 3002
                interface-address 192.168.1.11
                mesh-seed-address-port 192.168.1.10 3002
                interval 150
                timeout 10
        }

        fabric {
                address 10.0.2.15
                port 3001
        }

        info {
                port 3003
        }
}

namespace test {
        replication-factor 2
        memory-size 1G
        default-ttl 0

        storage-engine device {
                file /opt/aerospike/data/test.dat
                filesize 1G
                data-in-memory true
        }
}

Log file first 1000 lines : http://pastebin.com/HhzcuGnb

Also i ran asinfo -v 'dun:nodes=bb9ca5f3a270008,bb97abaa1270008' on both the nodes as it was suggested in logs.


#2

Hi,

I’ve caught by the same issue How to remove a node from a cluster but on existed cluster. No solution found so far.

Regards, Alex


#3

Hi,

Check if you’re able to telnet/nc to tcp ports 3000-3003 in each direction.

Regards, Alex


#4

Hey, I ran the command on both the nodes, all of them had similar output. No problems in network.

nc -vz 192.168.1.11 3003
Connection to 192.168.1.11 3003 port [tcp/*] succeeded!

NETSTAT Outputs

Node 1

netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3001            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3002            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3003            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:8081            0.0.0.0:*               LISTEN
tcp        0      0 10.0.2.15:46847         192.168.1.11:3002      ESTABLISHED
tcp        0      0 10.0.2.15:22            10.0.2.2:56069          ESTABLISHED
tcp        0      0 10.0.2.15:3002          10.0.2.2:52724          ESTABLISHED
tcp6       0      0 :::22                   :::*                    LISTEN

Node 2

netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3001            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3002            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3003            0.0.0.0:*               LISTEN
tcp        0      0 10.0.2.15:3001          10.0.2.15:37376         ESTABLISHED
tcp        0      0 10.0.2.15:3002          10.0.2.2:58704          ESTABLISHED
tcp        0      0 10.0.2.15:3001          10.0.2.2:58667          ESTABLISHED
tcp        0    180 10.0.2.15:22            10.0.2.2:58654          ESTABLISHED
tcp        0      0 10.0.2.15:37889         192.168.1.10:3002       ESTABLISHED
tcp        0      0 10.0.2.15:37376         10.0.2.15:3001          ESTABLISHED
tcp6       0      0 :::22                   :::*                    LISTEN

#5

The address parameter for the fabric context currently only supports any. The interface-address in the heartbeat context will control which interface is used for fabirc as well.

Can you ping Node 2 from Node 1? The address you use to ping from Node 1 to Node 2 should be the same address used for interface-address and mesh-seed-address-port in the heartbeat context (and vice versa). If it isn’t, this is likely the issue.