I have 3 vagrant boxes setup (ubuntu 12.04, 3.5.3) with a namespace setup with a replication_factor of 2.
Each vagrant box has host only network setup with the ip addresses 33.33.33.{91-93}.
I start node 1, and then node 2 and 3.
I can see in the logs that nodes 2 and 3 connect, but asmonitor/asinfo still show a ClusterSize of 1.
My mesh configuration is as follows for the first node:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000 }
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
} }
network {
service {
address any
port 3000
access-address 33.33.33.91
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
address 33.33.33.91
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
} }
namespace sandbox {
replication-factor 2
memory-size 100M
default-ttl 30d # 30 days, use 0 to never expire/evict.
# To use file storage backing, comment out the line above and use the
# following lines instead.
storage-engine device {
file /opt/data/bar.dat
filesize 200M
data-in-memory true # Store data in memory in addition to file.
}
}
For nodes 2 and 3 the only difference is the network service&heartbeat stanzas:
network {
service {
address any
access-address 33.33.33.92
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
address 33.33.33.92
mesh-seed-address-port 33.33.33.91 3002
interval 250
timeout 10
}
}
When node 2 starts I can see the following in the logs:
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2503) setting replication factors: cluster size 1, paxos single replica limit 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::2510) {sandbox} replication factor is 1
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3755) global partition state: total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3756) partition state after fixing lost partitions (master): total 4096 lost 0 unique 4096 duplicate 0
Mar 13 2015 22:55:43 GMT: INFO (paxos): (partition.c::3757) 0 new partition version tree paths generated
Mar 13 2015 22:55:43 GMT: INFO (partition): (partition.c::364) ALLOW MIGRATIONS
Mar 13 2015 22:55:43 GMT: INFO (paxos): (paxos.c::3143) Paxos service ignited: bb9a60c88270008
Mar 13 2015 22:55:44 GMT: INFO (scan): (thr_tscan.c::2081) started 32 threads
Mar 13 2015 22:55:44 GMT: INFO (batch): (thr_batch.c::342) Initialize 4 batch worker threads.
Mar 13 2015 22:55:44 GMT: INFO (drv_ssd): (drv_ssd.c::4316) {sandbox} floor set at 45 wblocks per device
Mar 13 2015 22:55:48 GMT: INFO (paxos): (paxos.c::3205) paxos supervisor thread started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1961) connecting to remote heartbeat service at 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #7 for thr_demarshal()
Mar 13 2015 22:55:48 GMT: INFO (ldt): (thr_nsup.c::1153) LDT supervisor started
Mar 13 2015 22:55:48 GMT: INFO (nsup): (thr_nsup.c::1196) namespace supervisor started
Mar 13 2015 22:55:48 GMT: INFO (hb): (hb.c::1042) initiated connection to mesh host at 33.33.33.91:3002 socket 60 from 33.33.33.91:3002
Mar 13 2015 22:55:48 GMT: INFO (demarshal): (thr_demarshal.c::249) Service started: socket 3000
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #9 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #10 for thr_demarshal()
Mar 13 2015 22:55:49 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #11 for thr_demarshal()
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::726) Waiting to spawn demarshal threads ...
Mar 13 2015 22:55:50 GMT: INFO (demarshal): (thr_demarshal.c::729) Started 4 Demarshal Threads
Mar 13 2015 22:55:50 GMT: INFO (as): (as.c::449) service ready: soon there will be cake!
Netstat shows me both node 2 and 3 are connected to node 1:
tcp 0 0 33.33.33.91:3002 0.0.0.0:* LISTEN
tcp 0 0 33.33.33.91:3002 33.33.33.93:49673 ESTABLISHED
tcp 0 0 33.33.33.91:3002 33.33.33.92:46269 ESTABLISHED
What am I missing to enable a cluster forming?