I’m using mesh mode to form a cluster. It worked fine until some time ago the cluster is being broken. All nodes are running with Aerospike but there is no solid cluster. Aerospike is running under docker in host mode. Here is my config:
# Aerospike database configuration file.
# This stanza must come first.
service {
user root
group root
# paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 16
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 100000
proto-fd-idle-ms 10000
paxos-recovery-policy auto-reset-master
paxos-max-cluster-size 60
query-in-transaction-thread true
allow-inline-transactions false
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
# Send log messages to stdout
console {
context any critical
}
}
network {
service {
address any
port 3000
# Uncomment the following to set the `access-address` parameter to the
# IP address of the Docker host. This will the allow the server to correctly
# publish the address which applications and other nodes in the cluster to
# use when addressing this node.
access-address 10.x.x.30
}
heartbeat {
#mode multicast
#address 239.1.99.2
#port 9918
# mesh is used for environments that do not support multicast
mode mesh
port 3002
mesh-seed-address-port node.dc01.domain 3002
mesh-seed-address-port node.dc02.domain 3002
mesh-seed-address-port node.dc03.domain 3002
mesh-seed-address-port node.dc04.domain 3002
mesh-seed-address-port node.dc05.domain 3002
mesh-seed-address-port node.dc06.domain 3002
mesh-seed-address-port node.dc07.domain 3002
mesh-seed-address-port node.dc08.domain 3002
mesh-seed-address-port node.dc09.domain 3002
mesh-seed-address-port node.dc10.domain 3002
mesh-seed-address-port node.dc11.domain 3002
mesh-seed-address-port node.dc12.domain 3002
mesh-seed-address-port node.dc13.domain 3002
mesh-seed-address-port node.dc14.domain 3002
mesh-seed-address-port node.dc15.domain 3002
mesh-seed-address-port node.dc16.domain 3002
mesh-seed-address-port node.dc17.domain 3002
mesh-seed-address-port node.dc18.domain 3002
mesh-seed-address-port node.dc19.domain 3002
mesh-seed-address-port node.dc20.domain 3002
mesh-seed-address-port node.dc21.domain 3002
mesh-seed-address-port node.dc22.domain 3002
mesh-seed-address-port node.dc23.domain 3002
mesh-seed-address-port node.dc24.domain 3002
mesh-seed-address-port node.dc25.domain 3002
mesh-seed-address-port node.dc26.domain 3002
mesh-seed-address-port node.dc27.domain 3002
mesh-seed-address-port node.dc28.domain 3002
mesh-seed-address-port node.dc29.domain 3002
mesh-seed-address-port node.dc30.domain 3002
mesh-seed-address-port node.dc31.domain 3002
mesh-seed-address-port node.dc32.domain 3002
mesh-seed-address-port node.dc33.domain 3002
# use asinfo -v 'tip:host=<ADDR>;port=3002' to inform cluster of
# other mesh nodes
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 2
memory-size 5G
default-ttl 1d # 5 days, use 0 to never expire/evict.
storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
#storage-engine device {
# file /opt/aerospike/data/test.dat
# filesize 4G
# data-in-memory true # Store data in memory in addition to file.
#}
}
namespace incProfiles {
replication-factor 2
memory-size 90G
default-ttl 6d
storage-engine memory
high-water-memory-pct 90
stop-writes-pct 95
#high-water-disk-pct 50
#conflict-resolution-policy=generation
}
Here is network status of the cluster in bad state:
Admin> info network ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ node.dc node.dc Ip Build Cluster Cluster Cluster Principal Client Uptime . Id . . Size Key Integrity . Conns . 10.x.x.10:3000 000000000000000 10.x.x.10:3000 N/E N/E N/E N/E N/E N/E N/E 10.x.x.17:3000 000000000000000 10.x.x.17:3000 N/E N/E N/E N/E N/E N/E N/E 10.x.x.19:3000 000000000000000 10.x.x.19:3000 N/E N/E N/E N/E N/E N/E N/E 10.x.x.40:3000 000000000000000 10.x.x.40:3000 N/E N/E N/E N/E N/E N/E N/E 10.x.x.42:3000 000000000000000 10.x.x.42:3000 N/E N/E N/E N/E N/E N/E N/E node.dc02.domain:3000 BB970ED6C3ACAB8 10.x.x.11:3000 C-3.9.1 1 C8CD59622BFE51BF True BB970ED6C3ACAB8 5 618:25:27 node.dc03.domain:3000 BB9000A6E3ACAB8 10.x.x.12:3000 C-3.9.1 1 FD1C975115230888 True BB9000A6E3ACAB8 6 618:25:27 node.dc04.domain:3000 BB9B0EA6C3ACAB8 10.x.x.13:3000 C-3.9.1 2 C2E9C37D348C8DAF False BB9B0EA6C3ACAB8 9 618:25:26 node.dc06.domain:3000 BB988EC6C3ACAB8 10.x.x.15:3000 C-3.9.1 1 9B4764CE266D06F1 True BB988EC6C3ACAB8 6 618:25:26 node.dc07.domain:3000 BB9D0F96C3ACAB8 10.x.x.16:3000 C-3.9.1 1 9D14A9D9CC435FBB True BB9D0F96C3ACAB8 6 618:25:26 node.dc09.domain:3000 BB958F66C3ACAB8 10.x.x.18:3000 C-3.9.1 1 CF6FA645E615F23E True BB958F66C3ACAB8 5 618:25:26 node.dc11.domain:3000 BB950EF6C3ACAB8 10.x.x.20:3000 C-3.9.1 1 9681209EDEE43B08 True BB950EF6C3ACAB8 6 618:25:26 node.dc12.domain:3000 BB920F56C3ACAB8 10.x.x.21:3000 C-3.9.1 1 DF3C2415BF19C57D True BB920F56C3ACAB8 6 618:25:26 node.dc13.domain:3000 *BB9F0ED6C3ACAB8 10.x.x.22:3000 C-3.9.1 1 A262B11D8B30F641 True BB9F0ED6C3ACAB8 178 618:25:26 node.dc14.domain:3000 BB9B8F56C3ACAB8 10.x.x.23:3000 C-3.9.1 2 7B07432253E20B4A False BB9B8F56C3ACAB8 9 618:25:26 node.dc15.domain:3000 BB900FD6C3ACAB8 10.x.x.24:3000 C-3.9.1 2 7B07432253E20B4A False BB9B8F56C3ACAB8 9 618:25:26 node.dc16.domain:3000 BB9B0F26C3ACAB8 10.x.x.25:3000 C-3.9.1 1 57E1D8F60D5C3C71 True BB9B0F26C3ACAB8 6 618:25:26 node.dc17.domain:3000 BB948086E3ACAB8 10.x.x.26:3000 C-3.9.1 1 13D50C00688A09D2 True BB948086E3ACAB8 5 618:25:26 node.dc18.domain:3000 BB930046D3ACAB8 10.x.x.27:3000 C-3.9.1 1 B1A1F16BAEE15751 True BB930046D3ACAB8 6 618:25:26 node.dc19.domain:3000 BB9A0F76C3ACAB8 10.x.x.28:3000 C-3.9.1 1 6C3707DDDB0106D6 True BB9A0F76C3ACAB8 5 618:25:26 node.dc20.domain:3000 BB9C0106E3ACAB8 10.x.x.29:3000 C-3.9.1 1 C249ACA190A7C653 True BB9C0106E3ACAB8 410 618:25:26 node.dc21.domain:3000 BB9A8F76C3ACAB8 10.x.x.30:3000 C-3.9.1 2 B9ADA580898A8ED2 False BB9D0EE6C3ACAB8 9 618:25:26 node.dc22.domain:3000 BB978F36D3ACAB8 10.x.x.31:3000 C-3.9.1 1 972A9B8E5F10269B True BB978F36D3ACAB8 6 618:25:26 node.dc23.domain:3000 BB9E0106E3ACAB8 10.x.x.32:3000 C-3.9.1 1 ED943079A4BEBDB6 True BB9E0106E3ACAB8 6 618:25:26 node.dc24.domain:3000 BB9C0086E3ACAB8 10.x.x.33:3000 C-3.9.1 1 DD55D01A8C0729D7 True BB9C0086E3ACAB8 5 618:25:26 node.dc25.domain:3000 BB9C0F06D3ACAB8 10.x.x.34:3000 C-3.9.1 1 4445F126014122D4 True BB9C0F06D3ACAB8 6 618:25:26 node.dc26.domain:3000 BB9A8BF6F3ACAB8 10.x.x.35:3000 C-3.9.1 1 2AF01ED007DE00B0 True BB9A8BF6F3ACAB8 8 618:25:26 node.dc27.domain:3000 BB9D8056E3ACAB8 10.x.x.36:3000 C-3.9.1 1 E956B97BC70D5C9D True BB9D8056E3ACAB8 2 618:25:26 node.dc28.domain:3000 BB9E0EE6C3ACAB8 10.x.x.37:3000 C-3.9.1 1 BBF8236C102B8BC1 True BB9E0EE6C3ACAB8 6 618:25:26 node.dc29.domain:3000 BB9380D6E3ACAB8 10.x.x.38:3000 C-3.9.1 1 ED29A5BF080900CE True BB9380D6E3ACAB8 6 618:25:26 node.dc30.domain:3000 BB9A0BD6F3ACAB8 10.x.x.39:3000 C-3.9.1 2 C2E9C37D348C8DAF False BB9B0EA6C3ACAB8 9 618:25:26 node.dc32.domain:3000 BB9D0EE6C3ACAB8 10.x.x.41:3000 C-3.9.1 2 B9ADA580898A8ED2 False BB9D0EE6C3ACAB8 9 618:25:26 Number of rows: 32
Configs are almost the same (exclude replication factor and access address):
Admin> show config diff
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : 10.x.x.10:3000 10.x.x.17:3000 10.x.x.19:3000 10.x.x.40:3000 10.x.x.42:3000 node.dc02.domain:3000 node.dc03.domain:3000 node.dc04.domain:3000 node.dc06.domain:3000 node.dc07.domain:3000 node.dc09.domain:3000 node.dc11.domain:3000 node.dc12.domain:3000 node.dc13.domain:3000 node.dc14.domain:3000 node.dc15.domain:3000 node.dc16.domain:3000 node.dc17.domain:3000 node.dc18.domain:3000 node.dc19.domain:3000 node.dc20.domain:3000 node.dc21.domain:3000 node.dc22.domain:3000 node.dc23.domain:3000 node.dc24.domain:3000 node.dc25.domain:3000 node.dc26.domain:3000 node.dc27.domain:3000 node.dc28.domain:3000 node.dc29.domain:3000 node.dc30.domain:3000 node.dc32.domain:3000
service.access-address: N/E N/E N/E N/E N/E 10.x.x.11 N/E N/E N/E N/E 10.x.x.18 N/E N/E 10.x.x.22 N/E N/E N/E 10.x.x.26 N/E 10.x.x.28 N/E N/E N/E N/E 10.x.x.33 N/E 10.x.x.35 10.x.x.36 N/E N/E N/E N/E
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node.dc02.domain:3000 node.dc03.domain:3000 node.dc04.domain:3000 node.dc06.domain:3000 node.dc07.domain:3000 node.dc09.domain:3000 node.dc11.domain:3000 node.dc12.domain:3000 node.dc13.domain:3000 node.dc14.domain:3000 node.dc15.domain:3000 node.dc16.domain:3000 node.dc17.domain:3000 node.dc18.domain:3000 node.dc19.domain:3000 node.dc20.domain:3000 node.dc21.domain:3000 node.dc22.domain:3000 node.dc23.domain:3000 node.dc24.domain:3000 node.dc25.domain:3000 node.dc26.domain:3000 node.dc27.domain:3000 node.dc28.domain:3000 node.dc29.domain:3000 node.dc30.domain:3000 node.dc32.domain:3000
repl-factor: 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~incProfiles Namespace Configuration~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : node.dc02.domain:3000 node.dc03.domain:3000 node.dc04.domain:3000 node.dc06.domain:3000 node.dc07.domain:3000 node.dc09.domain:3000 node.dc11.domain:3000 node.dc12.domain:3000 node.dc13.domain:3000 node.dc14.domain:3000 node.dc15.domain:3000 node.dc16.domain:3000 node.dc17.domain:3000 node.dc18.domain:3000 node.dc19.domain:3000 node.dc20.domain:3000 node.dc21.domain:3000 node.dc22.domain:3000 node.dc23.domain:3000 node.dc24.domain:3000 node.dc25.domain:3000 node.dc26.domain:3000 node.dc27.domain:3000 node.dc28.domain:3000 node.dc29.domain:3000 node.dc30.domain:3000 node.dc32.domain:3000
repl-factor: 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2
I tried tip command to join nodes but that didn’t help:
asinfo -h node.dc02.domain -v 'tip:host=node.dc05.domain;port=3002'
So the question why did that happen and how to fix that? Also offtopic question: is there is ability to turn rebalance off?