Vagrant mesh cluster configuration

dpires · March 13, 2015, 10:59pm

I have 3 vagrant boxes setup (ubuntu 12.04, 3.5.3) with a namespace setup with a replication_factor of 2.

Each vagrant box has host only network setup with the ip addresses 33.33.33.{91-93}.

I start node 1, and then node 2 and 3.

I can see in the logs that nodes 2 and 3 connect, but asmonitor/asinfo still show a ClusterSize of 1.

My mesh configuration is as follows for the first node:

    service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000 }
    
    logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
            context any info
        }    }
    
    network {
        service {
            address any
            port 3000
            access-address 33.33.33.91
        }   
    
        heartbeat {
            mode mesh
            port 3002 # Heartbeat port for this node.
            address 33.33.33.91
            interval 250
            timeout 10
        }   
    
        fabric {
            port 3001
        }   
    
        info {
            port 3003
        }    }
    
    namespace sandbox {
        replication-factor 2
        memory-size 100M
        default-ttl 30d # 30 days, use 0 to never expire/evict.
    
    
        # To use file storage backing, comment out the line above and use the 
        # following lines instead.
        storage-engine device {
            file /opt/data/bar.dat
            filesize 200M
            data-in-memory true # Store data in memory in addition to file.
    }
}

For nodes 2 and 3 the only difference is the network service&heartbeat stanzas:

network {
    service {
        address any
        access-address 33.33.33.92
        port 3000
    }   

    heartbeat {
        mode mesh
        port 3002 # Heartbeat port for this node.
        address 33.33.33.92
        mesh-seed-address-port 33.33.33.91 3002 
        interval 250
        timeout 10
    }   
}

When node 2 starts I can see the following in the logs:

Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2503) setting replication factors: cluster size 1, paxos single replica limit 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2510) {sandbox} replication factor is 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3755) global partition state: total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3756) partition state after fixing lost partitions (master): total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3757) 0 new partition version tree paths generated
Mar 13 2015 22:55:43 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Mar 13 2015 22:55:43 GMT: INFO (paxos): (paxos.c::3143) Paxos service ignited: bb9a60c88270008
Mar 13 2015 22:55:44 GMT: INFO (scan): (thr_tscan.c::2081) started 32 threads
Mar 13 2015 22:55:44 GMT: INFO (batch): (thr_batch.c::342) Initialize 4 batch worker threads.
Mar 13 2015 22:55:44 GMT: INFO (drv_ssd): (drv_ssd.c::4316) {sandbox} floor set at 45 wblocks per device
Mar 13 2015 22:55:48 GMT: INFO (paxos): (paxos.c::3205) paxos supervisor thread started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1961) connecting to remote heartbeat service at 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #7 for thr_demarshal()
Mar 13 2015 22:55:48 GMT: INFO (ldt): (thr_nsup.c::1153) LDT supervisor started
Mar 13 2015 22:55:48 GMT: INFO (nsup): (thr_nsup.c::1196) namespace supervisor started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1042) initiated connection to mesh host at 33.33.33.91:3002 socket 60 from 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::249) Service started: socket 3000
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #9 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #10 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #11 for thr_demarshal()
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::726) Waiting to spawn demarshal threads ...
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::729) Started 4 Demarshal Threads
Mar 13 2015 22:55:50 GMT: INFO (as): (as.c::449) service ready: soon there will be cake!

Netstat shows me both node 2 and 3 are connected to node 1:

tcp        0      0 33.33.33.91:3002        0.0.0.0:*               LISTEN     
tcp        0      0 33.33.33.91:3002        33.33.33.93:49673       ESTABLISHED
tcp        0      0 33.33.33.91:3002        33.33.33.92:46269       ESTABLISHED

What am I missing to enable a cluster forming?

dpires · March 16, 2015, 5:54pm

Problem solved:

network-interface-name eth1

I added the network-interface-name. I didn’t think this was required as the nodes had established connections on 3002, they were just not communicating.

Topic		Replies	Views
Doubts with replicas Configuration	3	1741	January 13, 2015
Cluster setup issue: 2 nodes: community editions: one node is ubuntu machine and other node is vagrant box on windows machine Configuration	12	4023	June 10, 2016
Is there any limit on the cluster size when using mesh heartbeats? Planning	5	3822	March 14, 2018
Aerospike cluster - node size is still 1 Configuration	7	1832	April 2, 2017
Problems with cluster setup on VM Configuration	2	1750	September 20, 2016

Vagrant mesh cluster configuration

Related topics