Vagrant mesh cluster configuration


#1

I have 3 vagrant boxes setup (ubuntu 12.04, 3.5.3) with a namespace setup with a replication_factor of 2.

Each vagrant box has host only network setup with the ip addresses 33.33.33.{91-93}.

I start node 1, and then node 2 and 3.

I can see in the logs that nodes 2 and 3 connect, but asmonitor/asinfo still show a ClusterSize of 1.

My mesh configuration is as follows for the first node:

    service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000 }
    
    logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
            context any info
        }    }
    
    network {
        service {
            address any
            port 3000
            access-address 33.33.33.91
        }   
    
        heartbeat {
            mode mesh
            port 3002 # Heartbeat port for this node.
            address 33.33.33.91
            interval 250
            timeout 10
        }   
    
        fabric {
            port 3001
        }   
    
        info {
            port 3003
        }    }
    
    namespace sandbox {
        replication-factor 2
        memory-size 100M
        default-ttl 30d # 30 days, use 0 to never expire/evict.
    
    
        # To use file storage backing, comment out the line above and use the 
        # following lines instead.
        storage-engine device {
            file /opt/data/bar.dat
            filesize 200M
            data-in-memory true # Store data in memory in addition to file.
    }
}

For nodes 2 and 3 the only difference is the network service&heartbeat stanzas:

network {
    service {
        address any
        access-address 33.33.33.92
        port 3000
    }   

    heartbeat {
        mode mesh
        port 3002 # Heartbeat port for this node.
        address 33.33.33.92
        mesh-seed-address-port 33.33.33.91 3002 
        interval 250
        timeout 10
    }   
}

When node 2 starts I can see the following in the logs:

Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2503) setting replication factors: cluster size 1, paxos single replica limit 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2510) {sandbox} replication factor is 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3755) global partition state: total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3756) partition state after fixing lost partitions (master): total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3757) 0 new partition version tree paths generated
Mar 13 2015 22:55:43 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Mar 13 2015 22:55:43 GMT: INFO (paxos): (paxos.c::3143) Paxos service ignited: bb9a60c88270008
Mar 13 2015 22:55:44 GMT: INFO (scan): (thr_tscan.c::2081) started 32 threads
Mar 13 2015 22:55:44 GMT: INFO (batch): (thr_batch.c::342) Initialize 4 batch worker threads.
Mar 13 2015 22:55:44 GMT: INFO (drv_ssd): (drv_ssd.c::4316) {sandbox} floor set at 45 wblocks per device
Mar 13 2015 22:55:48 GMT: INFO (paxos): (paxos.c::3205) paxos supervisor thread started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1961) connecting to remote heartbeat service at 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #7 for thr_demarshal()
Mar 13 2015 22:55:48 GMT: INFO (ldt): (thr_nsup.c::1153) LDT supervisor started
Mar 13 2015 22:55:48 GMT: INFO (nsup): (thr_nsup.c::1196) namespace supervisor started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1042) initiated connection to mesh host at 33.33.33.91:3002 socket 60 from 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::249) Service started: socket 3000
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #9 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #10 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #11 for thr_demarshal()
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::726) Waiting to spawn demarshal threads ...
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::729) Started 4 Demarshal Threads
Mar 13 2015 22:55:50 GMT: INFO (as): (as.c::449) service ready: soon there will be cake!

Netstat shows me both node 2 and 3 are connected to node 1:

tcp        0      0 33.33.33.91:3002        0.0.0.0:*               LISTEN     
tcp        0      0 33.33.33.91:3002        33.33.33.93:49673       ESTABLISHED
tcp        0      0 33.33.33.91:3002        33.33.33.92:46269       ESTABLISHED

What am I missing to enable a cluster forming?


#2

Problem solved:

network-interface-name eth1

I added the network-interface-name. I didn’t think this was required as the nodes had established connections on 3002, they were just not communicating.