Good day!
I’m having some issues with clustering aerospike using multicast or mesh. I have 3 nodes with aerospike installed managed by Ansible. I have confirmed multicast is working using iperf (below), but it seems like Aerospike is sending data out of order.
Aerospike itself starts up with no issue, but the CLUSTER-SIZE is always set to 1.
Aerospike Version: Aerospike Community Edition build 3.15.0.2
OS: CentOS Linux release 7.4.1708 (Core)
IP: 172.28.128.17, 172.28.128.19, 172.28.128.20
aerospike.conf
service {
proto-fd-max 15000
paxos-single-replica-limit 1
}
logging {
console {
context any info
}
}
network {
service {
address 172.28.128.17
reuse-address
access-address 172.28.128.17
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
address 172.28.128.17
interval 150
timeout 10
}
fabric {
address 172.28.128.17
port 3001
}
info {
address 172.28.128.17
port 3003
}
}
namespace mem_cache {
replication-factor 2
memory-size 4G
default-ttl 0
storage-engine memory
}
firewall-cmd --list-ports
5353/udp 161/udp 3000/tcp 3001/tcp 3003/tcp 9918/udp 3002/tcp
tcpdump -i eth1 | grep 9918
15:40:29.826564 IP aerospike1.local.9918 > 239.1.99.222.9918: UDP, length 112
15:40:29.827115 IP 172.28.128.19.9918 > 239.1.99.222.9918: UDP, length 112
15:40:29.977982 IP 172.28.128.20.9918 > 239.1.99.222.9918: UDP, length 112
ip maddr
3: eth1
link 01:00:5e:00:00:01
link 33:33:00:00:00:01
link 33:33:ff:a2:a5:65
link 01:00:5e:00:00:fb
link 01:00:5e:01:63:de
inet 239.1.99.222
inet 224.0.0.251
inet 224.0.0.1
inet6 ff02::1:ffa2:a565
inet6 ff02::1
inet6 ff01::100:
iperf -s -u -B 239.1.99.222 -i 1 -p 9918
------------------------------------------------------------
Server listening on UDP port 9918
Binding to local address 239.1.99.222
Joining multicast group 239.1.99.222
Receiving 1470 byte datagrams
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 3] local 239.1.99.222 port 9918 connected with 172.28.128.19 port 9918
[ 4] local 239.1.99.222 port 9918 connected with 172.28.128.17 port 9918
[ 5] local 239.1.99.222 port 9918 connected with 172.28.128.20 port 9918
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0- 1.0 sec 784 Bytes 6.27 Kbits/sec 49.145 ms 99/ 106 (93%)
[ 3] 0.00-1.00 sec 6 datagrams received out-of-order
[ 4] 0.0- 1.0 sec 784 Bytes 6.27 Kbits/sec 48.463 ms 99/ 106 (93%)
[ 4] 0.00-1.00 sec 6 datagrams received out-of-order
[ 5] 0.0- 1.0 sec 784 Bytes 6.27 Kbits/sec 48.499 ms 99/ 106 (93%)
[ 5] 0.00-1.00 sec 6 datagrams received out-of-order
[ 3] 1.0- 2.0 sec 784 Bytes 6.27 Kbits/sec 87.291 ms 0/ 0 (-nan%)
[ 3] 1.00-2.00 sec 7 datagrams received out-of-order
[ 4] 1.0- 2.0 sec 784 Bytes 6.27 Kbits/sec 86.029 ms 0/ 0 (-nan%)
[ 4] 1.00-2.00 sec 7 datagrams received out-of-order
[ 5] 1.0- 2.0 sec 784 Bytes 6.27 Kbits/sec 85.953 ms 0/ 0 (-nan%)
[ 5] 1.00-2.00 sec 7 datagrams received out-of-order
[ 3] 2.0- 3.0 sec 672 Bytes 5.38 Kbits/sec 107.921 ms 0/ 0 (-nan%)
[ 3] 2.00-3.00 sec 6 datagrams received out-of-order
[ 4] 2.0- 3.0 sec 672 Bytes 5.38 Kbits/sec 107.055 ms 0/ 0 (-nan%)
[ 4] 2.00-3.00 sec 6 datagrams received out-of-order
[ 5] 2.0- 3.0 sec 672 Bytes 5.38 Kbits/sec 107.033 ms 0/ 0 (-nan%)
[ 5] 2.00-3.00 sec 6 datagrams received out-of-order
[ 3] 3.0- 4.0 sec 784 Bytes 6.27 Kbits/sec 123.574 ms 0/ 0 (-nan%)
[ 3] 3.00-4.00 sec 7 datagrams received out-of-order
[ 4] 3.0- 4.0 sec 784 Bytes 6.27 Kbits/sec 123.003 ms 0/ 0 (-nan%)
Relevant logs
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) service {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) proto-fd-max 15000
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) paxos-single-replica-limit 1
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) logging {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) console {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) context any info
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) network {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) service {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) reuse-address
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) access-address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) port 3000
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) heartbeat {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) mode multicast
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) multicast-group 239.1.99.222
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) port 9918
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) interval 150
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) timeout 10
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) fabric {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) port 3001
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) info {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) port 3003
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) namespace mem_cache {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) replication-factor 2
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) memory-size 4G
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) default-ttl 0
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) storage-engine memory
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3553) system file descriptor limit: 100000, proto-fd-max: 15000
Nov 14 2017 20:44:02 GMT: INFO (hardware): (hardware.c:1785) detected 3 CPU(s), 3 core(s), 1 NUMA node(s)
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:2566) Node port 3001, node ID bb98be4ca005452
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3594) node-id bb98be4ca005452
Nov 14 2017 20:44:02 GMT: INFO (namespace): (namespace_ce.c:96) {mem_cache} beginning COLD start
Nov 14 2017 20:44:02 GMT: INFO (as): (as.c:372) initializing services...
Nov 14 2017 20:44:02 GMT: INFO (tsvc): (thr_tsvc.c:118) 4 transaction queues: starting 4 threads per queue
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:785) updated fabric published address list to {172.28.128.17:3001}
Nov 14 2017 20:44:02 GMT: INFO (partition): (partition_balance.c:273) {mem_cache} 4096 partitions: found 4096 absent, 0 stored
Nov 14 2017 20:44:02 GMT: INFO (batch): (batch.c:597) starting 3 batch-index-threads
Nov 14 2017 20:44:02 GMT: INFO (batch): (thr_batch.c:374) starting 4 batch-threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:475) starting 8 fabric send threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 16 fabric rw channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric ctrl channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric bulk channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric meta channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:498) starting fabric accept thread
Nov 14 2017 20:44:02 GMT: INFO (hb): (hb.c:7537) initializing multicast heartbeat socket: 239.1.99.222:9918
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:1342) Setting multicast interface address: 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:1374) Joining multicast group: 239.1.99.222
Nov 14 2017 20:44:02 GMT: INFO (hb): (hb.c:7569) mtu of the network is 1500
Nov 14 2017 20:44:02 GMT: INFO (hb): (socket.c:1410) Started multicast heartbeat endpoint 172.28.128.17:9918
Nov 14 2017 20:44:02 GMT: INFO (nsup): (thr_nsup.c:1166) starting namespace supervisor threads
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (thr_demarshal.c:894) starting 3 demarshal threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (socket.c:708) Started fabric endpoint 172.28.128.17:3001
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (socket.c:708) Started client endpoint 172.28.128.17:3000
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (socket.c:708) Started client endpoint 127.0.0.1:3000
Nov 14 2017 20:44:02 GMT: INFO (info-port): (thr_info_port.c:307) starting info port thread
Nov 14 2017 20:44:02 GMT: INFO (info-port): (socket.c:708) Started info endpoint 172.28.128.17:3003
Nov 14 2017 20:44:02 GMT: INFO (as): (as.c:415) service ready: soon there will be cake!
Nov 14 2017 20:44:02 GMT: INFO (info): (thr_info.c:3453) Aerospike Telemetry Agent: Aerospike anonymous data collection is ACTIVE. For further information, see http://aerospike.com/aerospike-telemetry
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:7804) principal node - forming new cluster with succession list: bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:5517) applied new cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:7804) applied new succession list bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:5521) applied cluster size 1
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:1977) data exchange started with cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:2615) received commit command from principal node bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:2554) data exchange completed with cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (partition): (partition_balance.c:1000) {mem_cache} replication factor is 1
Nov 14 2017 20:44:04 GMT: INFO (partition): (partition_balance.c:974) {mem_cache} rebalanced: expected-migrations (0,0) expected-signals 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:241) system-memory: free-kbytes 345124 free-pct 69 heap-kbytes (1087685,1088156,1114112) heap-efficiency-pct 97.6
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:255) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:277) fds: proto (0,3,3) heartbeat (0,0,0) fabric (0,0,0)
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:286) heartbeat-received: self 198 foreign 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:316) fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (0,0)
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:371) {mem_cache} objects: all 0 master 0 prole 0 non-replica 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:416) {mem_cache} migrations: complete
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:435) {mem_cache} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:241) system-memory: free-kbytes 345096 free-pct 69 heap-kbytes (1087685,1088156,1114112) heap-efficiency-pct 97.6
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:255) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:277) fds: proto (0,3,3) heartbeat (0,0,0) fabric (0,0,0)
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:286) heartbeat-received: self 398 foreign 0
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:316) fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) r
Any guidance would be appreciated.