Aerospike not clustering using multicast or mesh

gosubwoy · November 14, 2017, 8:56pm

Good day!

I’m having some issues with clustering aerospike using multicast or mesh. I have 3 nodes with aerospike installed managed by Ansible. I have confirmed multicast is working using iperf (below), but it seems like Aerospike is sending data out of order.

Aerospike itself starts up with no issue, but the CLUSTER-SIZE is always set to 1.

Aerospike Version: Aerospike Community Edition build 3.15.0.2

OS: CentOS Linux release 7.4.1708 (Core)

IP: 172.28.128.17, 172.28.128.19, 172.28.128.20

aerospike.conf

service {
	proto-fd-max 15000
	paxos-single-replica-limit 1
}

logging {
	console {
		context any info
	}
}

network {
	service {
		address 172.28.128.17
		reuse-address
		access-address 172.28.128.17
		port 3000
	}

	heartbeat {
		mode multicast
		multicast-group 239.1.99.222
		port 9918
		address 172.28.128.17
		interval 150
		timeout 10
	}

	fabric {
		address 172.28.128.17
		port 3001
	}

	info {
		address 172.28.128.17
		port 3003
	}
}
namespace mem_cache {
	replication-factor 2
	memory-size 4G
	default-ttl 0
	storage-engine memory
}

firewall-cmd --list-ports

5353/udp 161/udp 3000/tcp 3001/tcp 3003/tcp 9918/udp 3002/tcp

tcpdump -i eth1 | grep 9918

15:40:29.826564 IP aerospike1.local.9918 > 239.1.99.222.9918: UDP, length 112
15:40:29.827115 IP 172.28.128.19.9918 > 239.1.99.222.9918: UDP, length 112
15:40:29.977982 IP 172.28.128.20.9918 > 239.1.99.222.9918: UDP, length 112

ip maddr

3:	eth1
	link  01:00:5e:00:00:01
	link  33:33:00:00:00:01
	link  33:33:ff:a2:a5:65
	link  01:00:5e:00:00:fb
	link  01:00:5e:01:63:de
	inet  239.1.99.222
	inet  224.0.0.251
	inet  224.0.0.1
	inet6 ff02::1:ffa2:a565
	inet6 ff02::1
	inet6 ff01::100:

iperf -s -u -B 239.1.99.222 -i 1 -p 9918

------------------------------------------------------------
Server listening on UDP port 9918
Binding to local address 239.1.99.222
Joining multicast group  239.1.99.222
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 239.1.99.222 port 9918 connected with 172.28.128.19 port 9918
[  4] local 239.1.99.222 port 9918 connected with 172.28.128.17 port 9918
[  5] local 239.1.99.222 port 9918 connected with 172.28.128.20 port 9918
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0- 1.0 sec   784 Bytes  6.27 Kbits/sec  49.145 ms   99/  106 (93%)
[  3] 0.00-1.00 sec  6 datagrams received out-of-order
[  4]  0.0- 1.0 sec   784 Bytes  6.27 Kbits/sec  48.463 ms   99/  106 (93%)
[  4] 0.00-1.00 sec  6 datagrams received out-of-order
[  5]  0.0- 1.0 sec   784 Bytes  6.27 Kbits/sec  48.499 ms   99/  106 (93%)
[  5] 0.00-1.00 sec  6 datagrams received out-of-order
[  3]  1.0- 2.0 sec   784 Bytes  6.27 Kbits/sec  87.291 ms    0/    0 (-nan%)
[  3] 1.00-2.00 sec  7 datagrams received out-of-order
[  4]  1.0- 2.0 sec   784 Bytes  6.27 Kbits/sec  86.029 ms    0/    0 (-nan%)
[  4] 1.00-2.00 sec  7 datagrams received out-of-order
[  5]  1.0- 2.0 sec   784 Bytes  6.27 Kbits/sec  85.953 ms    0/    0 (-nan%)
[  5] 1.00-2.00 sec  7 datagrams received out-of-order
[  3]  2.0- 3.0 sec   672 Bytes  5.38 Kbits/sec  107.921 ms    0/    0 (-nan%)
[  3] 2.00-3.00 sec  6 datagrams received out-of-order
[  4]  2.0- 3.0 sec   672 Bytes  5.38 Kbits/sec  107.055 ms    0/    0 (-nan%)
[  4] 2.00-3.00 sec  6 datagrams received out-of-order
[  5]  2.0- 3.0 sec   672 Bytes  5.38 Kbits/sec  107.033 ms    0/    0 (-nan%)
[  5] 2.00-3.00 sec  6 datagrams received out-of-order
[  3]  3.0- 4.0 sec   784 Bytes  6.27 Kbits/sec  123.574 ms    0/    0 (-nan%)
[  3] 3.00-4.00 sec  7 datagrams received out-of-order
[  4]  3.0- 4.0 sec   784 Bytes  6.27 Kbits/sec  123.003 ms    0/    0 (-nan%)

Relevant logs

Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) service {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	proto-fd-max 15000
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	paxos-single-replica-limit 1
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) logging {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	console {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		context any info
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	}
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) network {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	service {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		reuse-address
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		access-address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		port 3000
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	}
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	heartbeat {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		mode multicast
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		multicast-group 239.1.99.222
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		port 9918
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		interval 150
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		timeout 10
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	}
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	fabric {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		port 3001
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	}
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533)
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	info {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		address 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 		port 3003
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	}
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) namespace mem_cache {
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	replication-factor 2
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	memory-size 4G
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	default-ttl 0
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) 	storage-engine memory
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3533) }
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3553) system file descriptor limit: 100000, proto-fd-max: 15000
Nov 14 2017 20:44:02 GMT: INFO (hardware): (hardware.c:1785) detected 3 CPU(s), 3 core(s), 1 NUMA node(s)
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:2566) Node port 3001, node ID bb98be4ca005452
Nov 14 2017 20:44:02 GMT: INFO (config): (cfg.c:3594) node-id bb98be4ca005452
Nov 14 2017 20:44:02 GMT: INFO (namespace): (namespace_ce.c:96) {mem_cache} beginning COLD start
Nov 14 2017 20:44:02 GMT: INFO (as): (as.c:372) initializing services...
Nov 14 2017 20:44:02 GMT: INFO (tsvc): (thr_tsvc.c:118) 4 transaction queues: starting 4 threads per queue
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:785) updated fabric published address list to {172.28.128.17:3001}
Nov 14 2017 20:44:02 GMT: INFO (partition): (partition_balance.c:273) {mem_cache} 4096 partitions: found 4096 absent, 0 stored
Nov 14 2017 20:44:02 GMT: INFO (batch): (batch.c:597) starting 3 batch-index-threads
Nov 14 2017 20:44:02 GMT: INFO (batch): (thr_batch.c:374) starting 4 batch-threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:475) starting 8 fabric send threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 16 fabric rw channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric ctrl channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric bulk channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:492) starting 4 fabric meta channel recv threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (fabric.c:498) starting fabric accept thread
Nov 14 2017 20:44:02 GMT: INFO (hb): (hb.c:7537) initializing multicast heartbeat socket: 239.1.99.222:9918
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:1342) Setting multicast interface address: 172.28.128.17
Nov 14 2017 20:44:02 GMT: INFO (socket): (socket.c:1374) Joining multicast group: 239.1.99.222
Nov 14 2017 20:44:02 GMT: INFO (hb): (hb.c:7569) mtu of the network is 1500
Nov 14 2017 20:44:02 GMT: INFO (hb): (socket.c:1410) Started multicast heartbeat endpoint 172.28.128.17:9918
Nov 14 2017 20:44:02 GMT: INFO (nsup): (thr_nsup.c:1166) starting namespace supervisor threads
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (thr_demarshal.c:894) starting 3 demarshal threads
Nov 14 2017 20:44:02 GMT: INFO (fabric): (socket.c:708) Started fabric endpoint 172.28.128.17:3001
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (socket.c:708) Started client endpoint 172.28.128.17:3000
Nov 14 2017 20:44:02 GMT: INFO (demarshal): (socket.c:708) Started client endpoint 127.0.0.1:3000
Nov 14 2017 20:44:02 GMT: INFO (info-port): (thr_info_port.c:307) starting info port thread
Nov 14 2017 20:44:02 GMT: INFO (info-port): (socket.c:708) Started info endpoint 172.28.128.17:3003
Nov 14 2017 20:44:02 GMT: INFO (as): (as.c:415) service ready: soon there will be cake!
Nov 14 2017 20:44:02 GMT: INFO (info): (thr_info.c:3453) Aerospike Telemetry Agent: Aerospike anonymous data collection is ACTIVE. For further information, see http://aerospike.com/aerospike-telemetry
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:7804) principal node - forming new cluster with succession list: bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:5517) applied new cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:7804) applied new succession list bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (clustering): (clustering.c:5521) applied cluster size 1
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:1977) data exchange started with cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:2615) received commit command from principal node bb98be4ca005452
Nov 14 2017 20:44:04 GMT: INFO (exchange): (exchange.c:2554) data exchange completed with cluster key c7d50c8da169
Nov 14 2017 20:44:04 GMT: INFO (partition): (partition_balance.c:1000) {mem_cache} replication factor is 1
Nov 14 2017 20:44:04 GMT: INFO (partition): (partition_balance.c:974) {mem_cache} rebalanced: expected-migrations (0,0) expected-signals 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:241)    system-memory: free-kbytes 345124 free-pct 69 heap-kbytes (1087685,1088156,1114112) heap-efficiency-pct 97.6
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:255)    in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:277)    fds: proto (0,3,3) heartbeat (0,0,0) fabric (0,0,0)
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:286)    heartbeat-received: self 198 foreign 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:316)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (0,0)
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:371) {mem_cache} objects: all 0 master 0 prole 0 non-replica 0
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:416) {mem_cache} migrations: complete
Nov 14 2017 20:44:12 GMT: INFO (info): (ticker.c:435) {mem_cache} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:241)    system-memory: free-kbytes 345096 free-pct 69 heap-kbytes (1087685,1088156,1114112) heap-efficiency-pct 97.6
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:255)    in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:277)    fds: proto (0,3,3) heartbeat (0,0,0) fabric (0,0,0)
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:286)    heartbeat-received: self 398 foreign 0
Nov 14 2017 20:44:22 GMT: INFO (info): (ticker.c:316)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) r

Any guidance would be appreciated.

pgupta · November 15, 2017, 5:30am

Remove address entry – you use it for mesh mode. for multicast you use multicast-group. by using both you are creating some conflict and confusion in the server. so can you try commenting out the address entry.

# address 172....

gosubwoy · November 15, 2017, 1:59pm

I’ve remove the address entry and deployed again. My issue persists, still getting CLUSTER-SIZE 1.

/etc/aerospike/aerospike.conf

service {
	proto-fd-max 15000
	paxos-single-replica-limit 1
}

logging {
	console {
		context any info
	}
}

network {
	service {
		address 172.28.128.17
		reuse-address
		access-address 172.28.128.17
		port 3000
	}

	heartbeat {
		mode multicast
		multicast-group 239.1.99.222
		port 9918
		interval 150
		timeout 10
	}

	fabric {
		address 172.28.128.17
		port 3001
	}

	info {
		address 172.28.128.17
		port 3003
	}
}
namespace mem_cache {
	replication-factor 2
	memory-size 4G
	default-ttl 0
	storage-engine memory
}

journalctl -u aerospike -a -o cat -f | grep CLUSTER-SIZE

Nov 14 2017 21:29:27 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1
Nov 14 2017 21:29:37 GMT: INFO (info): (ticker.c:164) NODE-ID bb98be4ca005452 CLUSTER-SIZE 1

pgupta · November 15, 2017, 2:25pm

What is your network topology? How many multicast hops between nodes? By default mcast-ttl in aerospike is 0 - so only one hop. You may have to increase your mcast-ttl in multicast configuration based on your topology.

gosubwoy · November 15, 2017, 2:50pm

This is being done in Virtualbox with three VM’s. Two NIC per VM.

eth0 is the NAT

eth1 is a Host-only adapter attached to vboxnet0(below)

vboxnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
	ether 0a:00:27:00:00:00
	inet 172.28.128.1 netmask 0xffffff00 broadcast 172.28.128.255

eth0: connected to eth0
        "Intel 82540EM Gigabit Ethernet Controller (PRO/1000 MT Desktop Adapter)"
        ethernet (e1000), 52:54:00:CA:E4:8B, hw, mtu 1500
        ip4 default
        inet4 172.17.255.239/27
        inet6 fe80::5054:ff:feca:e48b/64

eth1: connected to eth1
        "Intel 82540EM Gigabit Ethernet Controller (PRO/1000 MT Desktop Adapter)"
        ethernet (e1000), 08:00:27:A2:A5:65, hw, mtu 1500
        inet4 172.28.128.17/24
        inet6 fe80::a00:27ff:fea2:a565/64

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    100    0        0 eth0
172.17.255.224  0.0.0.0         255.255.255.224 U     100    0        0 eth0
172.28.128.0    0.0.0.0         255.255.255.0   U     100    0        0 eth1

Also some useful information, when commenting out address, I no longer received multicast on eth1, but eth0. When I added it back I confirmed I received traffic over port 9918 on eth1 from the other 2 nodes.

rguo · November 15, 2017, 7:30pm

Multicast is typically not supported in most virtualized networks.

You can try promiscuous mode in each VM’s network adaptor settings.

Personally, I don’t recommend multicast as it’s all too easy to inadvertently add nodes into clusters that aren’t meant to be clustered.

gosubwoy · November 15, 2017, 9:03pm

I’ve set the adapters to to promiscuous mode with Allow-All, no dice. =(

Anything else I can look at?

rguo · November 15, 2017, 9:17pm

Are you sure the VMs are on the same virtual network?

NAT won’t work, and host-only doesn’t permit VM to VM traffic. You’d need a VM private network, either Internal Network, or “NAT Network” (not NAT), or Bridged.

[edit] I was wrong, host-only does permit VM to VM traffic.

gosubwoy · November 16, 2017, 7:30pm

Update:

I’ve tried Internal Network, Bridged and NAT Network all to no avail. I’ve tried with mesh heartbeat aswell, CLUSTER-SIZE remains 1.

heartbeat {
mode mesh
address 10.5.42.110
port 3002
interval  150
timeout 10
mesh-seed-address-port 10.5.42.110 3002
mesh-seed-address-port 10.5.42.139 3002
mesh-seed-address-port 10.5.42.149 3002
}

Any other ideas?

pgupta · November 16, 2017, 8:46pm

You can always start three separate aerospike processes in the same vm, separate them out by port numbers and form a cluster if getting networking working between VMs is an issue. Just allocate 3x mem and disk on the VM. You will need three separate config files - a1.conf, a2.conf and a3.conf and then provide command line argument to specify the config file for each asd launch.

rguo · November 16, 2017, 10:27pm

After experimenting with various Virtualbox settings, here’s what worked on my end:

VirtualBox 5.1.30 on OSX.

VM: eth0 NAT, eth1 Host-only

Limiting Aerospike cluster to “internal” (host-only) network: Notice how I lock down the interfaces for each network stanza using the address param. You need to do this with multiple NICs. I’m assuming eth0 (NAT) is where you’d want client access to come from and eth1(Host-only) handles all cluster internal traffic.

network {
        service {
                address eth0
                port 3000
        }

        heartbeat {
                address eth1
                mode mesh
                port 3002
                mesh-seed-address-port 192.168.99.101 #the eth1 ip of the other node(s)
                interval 150
                timeout 10
        }

        fabric {
                address eth1
                port 3001
        }

        info {
                address eth0
                port 3003
        }
}

This should also work with Internal Network, since Host-only is a superset of Internal. However by default Internal Network does not have a DHCP service. Read more about VirtualBox’s various networking models at: Chapter 6. Virtual Networking

gosubwoy · November 17, 2017, 4:26pm

I’ve tried the above configuration with eth0 being NAT and eth1 being Host-only, my CLUSTER-SIZE remains at 1.

When switching eth0 to NAT Network and eth1 to Bridged Network with Promiscuous mode set to Allow All, I’m able to cluster using mesh and multicast.

It does seem that host-only permits VM to VM communication.

EDIT:

VBox Version: 5.2.0r118431

gosubwoy · November 17, 2017, 9:37pm

After some further research we’ve noticed a few things.

The NODE ID is the same among all aerospike nodes.
Even when explicitly stating the eth1 address, aeropsike binds itself to eth0.

rguo · November 17, 2017, 10:19pm

Same NODE ID reveals that each VM has the same MAC address. What is the NODE ID, just out of curiosity?

Please post your aerospike.conf’s network stanza as it currently is and the current network config of Virtualbox.

gosubwoy · November 20, 2017, 2:04pm

Network

network {
service {
address 172.28.128.8
port 3000
}

heartbeat {
mode multicast
multicast-group 239.1.99.222
address 172.28.128.8
port 9918
interval 150
timeout 10
}

fabric {
address 172.28.128.8
port 3001
}

info {
address 172.28.128.8
port 3003
}
}

The MAC address for the NAT(eth0) is the same on all the aerospike VM’s, which corresponds to the NODE-ID, BB98BE4CA005452.

The MAC address for Host-only(eth1) is different across all VM’s. The above value of address is set to Host-only(eth1) which is 172.28.128.x

Does the address bind influence which MAC is being used to generate the NODE-ID, or does it default to eth0?

gosubwoy · November 20, 2017, 2:34pm

Found where the configuration is being loaded.

https://github.com/aerospike/aerospike-server/blob/f06a3839054f5a396a41d0aa32d6dea6e3009a17/as/src/base/cfg.c#L771

and parsed

https://github.com/aerospike/aerospike-server/blob/f06a3839054f5a396a41d0aa32d6dea6e3009a17/as/src/base/thr_info.c#L1502

Explicitly declaring node-id-interface to eth1 in the service stanza has appeared to solve this issue.

EDIT:

Even with multicast, you must explicitly state address in your heartbeat substanza of Network.

service {
proto-fd-max 15000
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
node-id-interface eth1
}

logging {
console {
context any info
}
}

network {
service {
address 172.28.128.8
port 3000
}

heartbeat {
mode multicast
multicast-group 239.1.99.222
address 172.28.128.8
port 9918
interval 150
timeout 10
}

fabric {
address 172.28.128.8
port 3001
}

info {
address 172.28.128.8
port 3003
}
}

namespace foo {
replication-factor 2
memory-size 4G
default-ttl 0
storage-engine memory
}

NODE-ID

aerospike1.local | success | rc=0 >>
BB9BDC316270008

aerospike2.local | success | rc=0 >>
BB95CB2EA270008

aerospike3.local | success | rc=0 >>
BB99C7209270008

journalctl -u aerospike -f | grep "CLUSTER-SIZE"
Nov 20 09:34:47 aerospike1 asd[9945]: Nov 20 2017 14:34:47 GMT: INFO (info): (ticker.c:164) NODE-ID bb9bdc316270008 CLUSTER-SIZE 3

rguo · November 21, 2017, 7:35pm

Having multiple NICs does require special care in configuration, particularly in this scenario where each network behaves differently.

system · November 27, 2017, 7:36pm

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aerospike Cluster with multicast would be broken Configuration secondary , udf , index	7	2536	December 6, 2017
New node not getting added in running cluster Upgrading	12	5914	April 13, 2015
Aerospike cluster - node size is still 1 Configuration	7	1840	April 2, 2017
Aerospike cluster setup where multicast too powerfull Configuration	2	1992	January 5, 2015
Aerospike docker container 3.6.3 is not forming a cluster Configuration docker	6	3101	May 5, 2017

Aerospike not clustering using multicast or mesh

Related topics