2 aerospike Docker containers not able to form a cluster

container
docker

#1

Hello,

I am desperately trying to form a Aerospike 3.5.9 Community edition cluster of two Docker containers running on two different physical hosts whose ips are 10.5.8.160 and 10.5.8.161. I would like to avoid using the --net=host parameter in the docker run command line mentioned in this thread.

Here is the template configuration file of each aerospike node:

service {
  user root
  group root
  paxos-single-replica-limit 1
  paxos-recovery-policy auto-dun-master
  pidfile /var/run/aerospike/asd.pid
  service-threads 4
  transaction-queues 4
  transaction-threads-per-queue 4
  proto-fd-max 15000
}

logging {
  file /storage/logs/aerospike.log {
    context any info
  }
  console {
    context any info
  }
}
network {
  service {
	address any
    port 3000
    access-address NODE_ADDR virtual
	network-interface-name eth0
		
  }
  fabric {
    port 3001 
  }
  heartbeat {
	mode mesh
    port 3002
    
    mesh-address PEER_ADDR
	mesh-port 3002
    interval 150
    timeout  20
  }
  info {
    port 3003
  }
}
namespace testdata {
  replication-factor 2
  memory-size 1G
  default-ttl 5d
  storage-engine device {
    file /storage/data/testdata.dat
    filesize 4G
    data-in-memory true
  }
}

I am running each container this way (some basic sed operation in a pre-run script changes the configuration template variables with the container environment variables just before asd --foreground is run):

docker run -tid -e NODE_ADDR=10.5.8.160 -e PEER_ADDR=10.5.8.161 -v /opt/docker/aerospike/data:/storage/data -v /opt/docker/aerospike/logs:/storage/logs --name aerospike-node1 -p 3000:3000 -p 3001:3001 -p 3002:3002 -p 3003:3003 <myImage>

and:

docker run -tid -e NODE_ADDR=10.5.8.161 -e PEER_ADDR=10.5.8.160 -v /opt/docker/aerospike/data:/storage/data -v /opt/docker/aerospike/logs:/storage/logs --name aerospike-node2 -p 3000:3000 -p 3001:3001 -p 3002:3002 -p 3003:3003 <myImage>

Each node starts pretty well and heartbeats seem to be sent normally, however the cluster is not able to form, despite the addition of the “paxos-recovery-policy auto-dun-master” setting in the service configuration.

Here is the log I am able to see on the container running on 10.5.8.160 (second container is run a few seconds later so the first heartbeat connection failure is expected:

May 20 2015 13:22:00 GMT: INFO (cf:misc): (id.c::119) Node ip: 172.17.0.49
May 20 2015 13:22:00 GMT: INFO (cf:misc): (id.c::265) Heartbeat address for mesh: 172.17.0.49
May 20 2015 13:22:00 GMT: INFO (config): (cfg.c::3099) Rack Aware mode not enabled
May 20 2015 13:22:00 GMT: INFO (config): (cfg.c::3102) Node id bb9310011ac4202
May 20 2015 13:22:00 GMT: INFO (namespace): (namespace_cold.c::101) ns testdata beginning COLD start
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::3980) opened file /storage/data/testdata.dat: usable size 4294967296
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::1070) /storage/data/testdata.dat has 4096 wblocks of size 1048576
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::3438) device /storage/data/testdata.dat: reading device to load index
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::3443) In TID 16: Using arena #150 for loading data for namespace "testdata"
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::3464) device /storage/data/testdata.dat: read complete: UNIQUE 0 (REPLACED 0) (GEN 0) (EXPIRED 0) (MAX-TTL 0) records
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::1035) ns testdata loading free & defrag queues
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::969) /storage/data/testdata.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::1059) /storage/data/testdata.dat init wblock free-q 4095, defrag-q 0
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::2663) ns testdata starting device maintenance threads
May 20 2015 13:22:00 GMT: INFO (as): (as.c::410) initializing services...
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::1741) ns testdata starting write worker threads
May 20 2015 13:22:00 GMT: INFO (drv_ssd): (drv_ssd.c::886) ns testdata starting defrag threads
May 20 2015 13:22:00 GMT: INFO (tsvc): (thr_tsvc.c::1021) shared queues: 4 queues with 4 threads each
May 20 2015 13:22:00 GMT: INFO (hb): (hb.c::2459) heartbeat socket initialization
May 20 2015 13:22:00 GMT: INFO (hb): (hb.c::2473) initializing mesh heartbeat socket : 172.17.0.49:3002
May 20 2015 13:22:00 GMT: INFO (info): (thr_info.c::5224)  static external network definition
May 20 2015 13:22:00 GMT: INFO (paxos): (paxos.c::3127) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
May 20 2015 13:22:00 GMT: INFO (partition): (partition.c::4011) {testdata} 4096 partitions: found 0 absent, 4096 stored
May 20 2015 13:22:00 GMT: INFO (paxos): (paxos.c::3131) Paxos service ignited: bb9310011ac4202
May 20 2015 13:22:01 GMT: INFO (scan): (thr_tscan.c::2075) started 32 threads
May 20 2015 13:22:01 GMT: INFO (batch): (thr_batch.c::342) Initialize 4 batch worker threads.
May 20 2015 13:22:01 GMT: INFO (drv_ssd): (drv_ssd.c::4334) {testdata} floor set at 45 wblocks per device
May 20 2015 13:22:06 GMT: INFO (paxos): (paxos.c::3212) listening for other nodes (max 6000 milliseconds) ...
May 20 2015 13:22:06 GMT: INFO (hb): (hb.c::1971) connecting to remote heartbeat service at 10.5.8.161:3002
May 20 2015 13:22:12 GMT: INFO (hb): (hb.c::1055) initiated connection to mesh host at 10.5.8.161:3002 (10.5.8.161:3002) via socket 58 from 172.17.0.49:40147
May 20 2015 13:22:12 GMT: INFO (paxos): (paxos.c::3229) ... no other nodes detected - node will operate as a single-node cluster
May 20 2015 13:22:12 GMT: INFO (partition): (partition.c::4066) {testdata} 0 absent partitions promoted to master
May 20 2015 13:22:12 GMT: INFO (paxos): (paxos.c::3189) paxos supervisor thread started
May 20 2015 13:22:12 GMT: INFO (nsup): (thr_nsup.c::1289) namespace supervisor started
May 20 2015 13:22:12 GMT: INFO (ldt): (thr_nsup.c::1246) LDT supervisor started
May 20 2015 13:22:12 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #8 for thr_demarshal()
May 20 2015 13:22:12 GMT: INFO (demarshal): (thr_demarshal.c::249) Service started: socket 3000
May 20 2015 13:22:13 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #9 for thr_demarshal()
May 20 2015 13:22:13 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #10 for thr_demarshal()
May 20 2015 13:22:13 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #11 for thr_demarshal()
May 20 2015 13:22:14 GMT: INFO (demarshal): (thr_demarshal.c::726) Waiting to spawn demarshal threads ...
May 20 2015 13:22:14 GMT: INFO (demarshal): (thr_demarshal.c::729) Started 4 Demarshal Threads
May 20 2015 13:22:14 GMT: INFO (as): (as.c::450) service ready: soon there will be cake!
May 20 2015 13:22:18 GMT: INFO (hb): (hb.c::2388) new heartbeat received: bb9120011ac4202 principal node is bb9120011ac4202
May 20 2015 13:22:18 GMT: INFO (fabric): (fabric.c::1780) fabric: node bb9120011ac4202 arrived
May 20 2015 13:22:18 GMT: INFO (paxos): (paxos.c::1933) as_paxos_spark establishing transaction [1.0]@bb9310011ac4202 ClSz = 1 ; # change = 1 : ADD bb9120011ac4202;
May 20 2015 13:22:18 GMT: INFO (partition): (partition.c::392) DISALLOW MIGRATIONS
May 20 2015 13:22:18 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:18 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:20 GMT: INFO (drv_ssd): (drv_ssd.c::2436) device /storage/data/testdata.dat: used 0, contig-free 4095M (4095 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
May 20 2015 13:22:22 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9120011ac4202 and self bb9310011ac4202
May 20 2015 13:22:22 GMT: INFO (paxos): (paxos.c::2516) as_paxos_retransmit_check: principal bb9310011ac4202 retransmitting sync messages to nodes that have not responded yet ...
May 20 2015 13:22:22 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:22 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5398612kb ( 88 percent free )
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 4, 2) : fab (17, 18, 1)
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 116
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 103 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 2 rc 2
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4792)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4808) namespace testdata: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
May 20 2015 13:22:24 GMT: INFO (info): (thr_info.c::4853)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 20 2015 13:22:24 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 20 2015 13:22:27 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9120011ac4202 and self bb9310011ac4202
May 20 2015 13:22:27 GMT: INFO (paxos): (paxos.c::2516) as_paxos_retransmit_check: principal bb9310011ac4202 retransmitting sync messages to nodes that have not responded yet ...
May 20 2015 13:22:27 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:27 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:32 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9120011ac4202 and self bb9310011ac4202
May 20 2015 13:22:32 GMT: INFO (paxos): (paxos.c::2516) as_paxos_retransmit_check: principal bb9310011ac4202 retransmitting sync messages to nodes that have not responded yet ...
May 20 2015 13:22:32 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:32 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5418928kb ( 88 percent free )
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 4, 2) : fab (17, 20, 3)
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 248
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 235 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 2 rc 2
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4792)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4808) namespace testdata: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
May 20 2015 13:22:34 GMT: INFO (info): (thr_info.c::4853)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 20 2015 13:22:34 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 20 2015 13:22:37 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9120011ac4202 and self bb9310011ac4202
May 20 2015 13:22:37 GMT: INFO (paxos): (paxos.c::2516) as_paxos_retransmit_check: principal bb9310011ac4202 retransmitting sync messages to nodes that have not responded yet ...
May 20 2015 13:22:37 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:37 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:40 GMT: INFO (drv_ssd): (drv_ssd.c::2436) device /storage/data/testdata.dat: used 0, contig-free 4095M (4095 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
May 20 2015 13:22:42 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9120011ac4202 and self bb9310011ac4202
May 20 2015 13:22:42 GMT: INFO (paxos): (paxos.c::2516) as_paxos_retransmit_check: principal bb9310011ac4202 retransmitting sync messages to nodes that have not responded yet ...
May 20 2015 13:22:42 GMT: INFO (paxos): (paxos.c::1439) sending sync message to bb9120011ac4202
May 20 2015 13:22:42 GMT: INFO (paxos): (paxos.c::1448) SUCCESSION [1.0]@bb9310011ac4202: bb9310011ac4202 bb9120011ac4202
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5382248kb ( 88 percent free )
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 4, 2) : fab (16, 22, 6)
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 382
May 20 2015 13:22:44 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 369 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 2 rc 2

On the second container I am getting the following log:

May 20 2015 13:19:36 GMT: INFO (cf:misc): (id.c::119) Node ip: 172.17.0.18
May 20 2015 13:19:36 GMT: INFO (cf:misc): (id.c::265) Heartbeat address for mesh: 172.17.0.18
May 20 2015 13:19:36 GMT: INFO (config): (cfg.c::3099) Rack Aware mode not enabled
May 20 2015 13:19:36 GMT: INFO (config): (cfg.c::3102) Node id bb9120011ac4202
May 20 2015 13:19:36 GMT: INFO (namespace): (namespace_cold.c::101) ns testdata beginning COLD start
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::3980) opened file /storage/data/testdata.dat: usable size 4294967296
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::1070) /storage/data/testdata.dat has 4096 wblocks of size 1048576
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::3438) device /storage/data/testdata.dat: reading device to load index
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::3443) In TID 12: Using arena #150 for loading data for namespace "testdata"
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::3464) device /storage/data/testdata.dat: read complete: UNIQUE 0 (REPLACED 0) (GEN 0) (EXPIRED 0) (MAX-TTL 0) records
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::1035) ns testdata loading free & defrag queues
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::969) /storage/data/testdata.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::1059) /storage/data/testdata.dat init wblock free-q 4095, defrag-q 0
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::2663) ns testdata starting device maintenance threads
May 20 2015 13:19:36 GMT: INFO (as): (as.c::410) initializing services...
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::1741) ns testdata starting write worker threads
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::886) ns testdata starting defrag threads
May 20 2015 13:19:36 GMT: INFO (tsvc): (thr_tsvc.c::1021) shared queues: 4 queues with 4 threads each
May 20 2015 13:19:36 GMT: INFO (hb): (hb.c::2459) heartbeat socket initialization
May 20 2015 13:19:36 GMT: INFO (hb): (hb.c::2473) initializing mesh heartbeat socket : 172.17.0.18:3002
May 20 2015 13:19:36 GMT: INFO (info): (thr_info.c::5224)  static external network definition
May 20 2015 13:19:36 GMT: INFO (paxos): (paxos.c::3127) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
May 20 2015 13:19:36 GMT: INFO (partition): (partition.c::4011) {testdata} 4096 partitions: found 0 absent, 4096 stored
May 20 2015 13:19:36 GMT: INFO (paxos): (paxos.c::3131) Paxos service ignited: bb9120011ac4202
May 20 2015 13:19:36 GMT: INFO (scan): (thr_tscan.c::2075) started 32 threads
May 20 2015 13:19:36 GMT: INFO (batch): (thr_batch.c::342) Initialize 4 batch worker threads.
May 20 2015 13:19:36 GMT: INFO (drv_ssd): (drv_ssd.c::4334) {testdata} floor set at 45 wblocks per device
May 20 2015 13:19:42 GMT: INFO (paxos): (paxos.c::3212) listening for other nodes (max 6000 milliseconds) ...
May 20 2015 13:19:42 GMT: INFO (hb): (hb.c::1971) connecting to remote heartbeat service at 10.5.8.160:3002
May 20 2015 13:19:42 GMT: INFO (hb): (hb.c::1055) initiated connection to mesh host at 10.5.8.160:3002 (10.5.8.160:3002) via socket 59 from 172.17.0.18:49048
May 20 2015 13:19:42 GMT: INFO (paxos): (paxos.c::3220) ... other node(s) detected - node will operate in a multi-node cluster
May 20 2015 13:19:42 GMT: INFO (paxos): (paxos.c::3189) paxos supervisor thread started
May 20 2015 13:19:42 GMT: INFO (nsup): (thr_nsup.c::1289) namespace supervisor started
May 20 2015 13:19:42 GMT: INFO (ldt): (thr_nsup.c::1246) LDT supervisor started
May 20 2015 13:19:42 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #8 for thr_demarshal()
May 20 2015 13:19:42 GMT: INFO (hb): (hb.c::2388) new heartbeat received: bb9310011ac4202 principal node is bb9310011ac4202
May 20 2015 13:19:42 GMT: INFO (fabric): (fabric.c::1780) fabric: node bb9310011ac4202 arrived
May 20 2015 13:19:42 GMT: INFO (paxos): (paxos.c::2151) Skip node arrival bb9310011ac4202 cluster principal bb9120011ac4202 pulse principal bb9310011ac4202
May 20 2015 13:19:42 GMT: INFO (demarshal): (thr_demarshal.c::249) Service started: socket 3000
May 20 2015 13:19:43 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #10 for thr_demarshal()
May 20 2015 13:19:43 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #11 for thr_demarshal()
May 20 2015 13:19:43 GMT: INFO (demarshal): (thr_demarshal.c::221) Saved original JEMalloc arena #12 for thr_demarshal()
May 20 2015 13:19:44 GMT: INFO (demarshal): (thr_demarshal.c::726) Waiting to spawn demarshal threads ...
May 20 2015 13:19:44 GMT: INFO (demarshal): (thr_demarshal.c::729) Started 4 Demarshal Threads
May 20 2015 13:19:44 GMT: INFO (as): (as.c::450) service ready: soon there will be cake!
May 20 2015 13:19:47 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:19:52 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5432856kb ( 88 percent free )
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 2, 0) : f
ab (16, 16, 0)
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 200
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 159 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0
 um 0 mcf 0 rc 0
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4792)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4808) namespace testdata: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
May 20 2015 13:19:54 GMT: INFO (info): (thr_info.c::4853)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 20 2015 13:19:54 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 20 2015 13:19:56 GMT: INFO (drv_ssd): (drv_ssd.c::2436) device /storage/data/testdata.dat: used 0, contig-free 4095M (4095 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s
), defrag-q 0 defrag-tot 0 (0.0/s)
May 20 2015 13:19:57 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:20:02 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5409236kb ( 88 percent free )
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 2, 0) : f
ab (16, 16, 0)
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 334
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 293 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0
 um 0 mcf 0 rc 0
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4792)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4808) namespace testdata: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
May 20 2015 13:20:04 GMT: INFO (info): (thr_info.c::4853)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 20 2015 13:20:04 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
May 20 2015 13:20:07 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:20:12 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb9310011ac4202 and self bb9120011ac4202
May 20 2015 13:20:12 GMT: INFO (hb): (hb.c::2319) HB node bb9310011ac4202 in different cluster - succession lists don't match
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4744)  system memory: free 5422816kb ( 88 percent free )
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4752)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 0 ::: sub_objects 0
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4760)  rec refs 0 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4766)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4776)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (2, 2, 0) : f
ab (16, 16, 0)
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4778)    heartbeat_received: self 0 : foreign 466
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4779)    heartbeat_stats: bt 0 bf 425 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0
 um 0 mcf 0 rc 0
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4792)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
May 20 2015 13:20:14 GMT: INFO (info): (thr_info.c::4808) namespace testdata: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99

For some reason the cluster is willing to synchronize things but this does not work at all like expected. Can you help me troubleshoot the issue and allow the cluster to form successfully?

Thanks in advance,

Julien


#2

Thanks for bringing this to our attention. We will try and reproduce the issue.

Just curious , did you try to also set the interface-address in the heartbeat stanza?


#3

I tried to add the host address in the heartbeat stanza, but in that case aerospike refused to launch and aborted the process. I also tried to use the virtual keyword in that setting, but it is not recognized.

Thanks

Julien


#4

Could you try setting the containers IP address in the heartbeat stanza. both address and interface-address should be the actual containers IP address if you don’t want to use the --net=host. The container’s ip address will be different then 10.5.8.160 and 10.5.8.161. Container’s IP address will change each time you do a docker run. From log looks like your containers IPs are respectively 172.17.0.49 and 172.17.0.18

heartbeat {
	mode mesh
    port 3002
    address IP_OF_CONTAINER
    interface-address   IP_OF_CONTAINER

    mesh-address PEER_ADDR
	mesh-port 3002
    interval 150
    timeout  20
  }

Running netstat in the container can also show you which IP is listening on port 3002

sudo docker exec -it aerospike1 bash
root@49ca39823c14:/# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:ac:11:00:06  
          inet addr:172.17.0.6  Bcast:0.0.0.0  Mask:255.255.0.0
          inet6 addr: fe80::42:acff:fe11:6/64 Scope:Link
          UP BROADCAST RUNNING  MTU:1500  Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:578 (578.0 B)  TX bytes:578 (578.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@49ca39823c14:/# netstat -ant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:3001            0.0.0.0:*               LISTEN     
tcp        0      0 172.17.0.6:3002         0.0.0.0:*               LISTEN     
tcp        0      0 0.0.0.0:3003            0.0.0.0:*               LISTEN

#5

Thanks for your prompt answers and suggestions. I did not try yet to specify the container ip in the address and interface-address, this requires additional tuning in my pre-run scripts to store this information on-the-fly:-)

However I fear this would not be succificient, considering the ip state I was able to gather through a ‘docker exec aerospike-node1 ss -ant’ (netstat is apparently deprecated on the centos container running aerospike).

aerospike-node1:

@aerospike160f:/opt/docker/aerospike# docker exec aerospike-node1 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
165: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:11:00:33 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.51/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:33/64 scope link
       valid_lft forever preferred_lft forever
    root@aerospike160f:/opt/docker/aerospike# docker exec aerospike-node1 ss -ant
    State      Recv-Q Send-Q        Local Address:Port          Peer Address:Port
    LISTEN     0      128                       *:3000                     *:*
    LISTEN     0      128                       *:3001                     *:*
    LISTEN     0      128             172.17.0.51:3002                     *:*
    LISTEN     0      128                       *:3003                     *:*
    ESTAB      0      0               172.17.0.51:3002            10.5.8.161:56276
    ESTAB      0      0               172.17.0.51:42266           10.5.8.161:3002
    SYN-SENT   0      1               172.17.0.51:49843          172.17.0.19:3001

aerospike-node2:

@aerospike161f:/opt/docker/aerospike# docker exec aerospike-node2 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
41: eth0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:ac:11:00:13 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.19/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe11:13/64 scope link
       valid_lft forever preferred_lft forever
root@aerospike161f:/opt/docker/aerospike# docker exec aerospike-node2 ss -pant
State      Recv-Q Send-Q        Local Address:Port          Peer Address:Port
LISTEN     0      128                       *:3000                     *:*
LISTEN     0      128                       *:3001                     *:*
LISTEN     0      128             172.17.0.19:3002                     *:*
LISTEN     0      128                       *:3003                     *:*
ESTAB      0      0               172.17.0.19:3002            10.5.8.160:42266
ESTAB      0      0               172.17.0.19:56276           10.5.8.160:3002

The two established connections seem to indicate that the unicast heartbeat is correctly operating, what do you think? Besides I get the listening ports on socket 3002 being declared correctly on the container private ip addresses.

If you still consider it is worth doing the test with the additional container ip in the heartbeat stanza, just let me know ; there are probably technical stuff I’m not yet fully aware of regarding docker and/or aerospike.

Thanks

Julien


#6

Since you are using different physical hosts the private ips used within the containers would not be routable to one another. There may be a way to use routable IPs within a docker container.

We’ve tested our cluster using the –net=host which forces the container to use the host ip address which is routable between the two physical hosts.

I would suggest using the --net=host or create some sort of overlay network using Weave or Flannel

or

https://coreos.com/blog/introducing-rudder/

I’ll also keep looking for an alternative.


#7

OK. Of course resuming the container run with --net=host restores the cluster connectivity and all is working well.

So in short, could you just confirm that for the time being, the use of the virtual keyword for the service address allows to get rid of the --net=host argument in the following use-case

  • single-node cluster running in one container
  • multi-node cluster running in several containers on the same host

?

Thanks

Julien


#8

You are correct, at present time, in both of these cases you do not need to use the --net=host and could also use the virtual keywork in cases you would want to use the main host IP address.

  1. single-node cluster running in one container
  2. multi-node cluster running in several containers on the same host

#9

does overlay network support mode multicaster to setup cluster by docker images?