Issue of removing one node from mesh mode cluster


#1

The version is Aerospike CE 3.9.1. I set mesh mode which has 6 nodes, I remove one node from the cluster, the other nodes’ behavior seems is unexpected.

Firstly, two nodes crashed, the error message is firstly two warnings: Jun 08 2017 03:24:30 GMT: WARNING (cf:socket): (socket.c:526) Error while connecting FD 362: 111 (Connection refused) Jun 08 2017 03:24:30 GMT: WARNING (cf:socket): (socket.c:579) Error while connecting socket to 10.16.100.8:3002

then critical error: Jun 08 2017 03:24:38 GMT: CRITICAL (cf:socket): (socket.c:278) setsockopt(6¸ 1) failed on FD 313: 22 (Invalid argument)

another nodes reports a lot of warning very frequently: Jun 08 2017 04:02:03 GMT: WARNING (hb): (hb.c:4108) On fd 327 recv peek error Jun 08 2017 04:02:03 GMT: WARNING (hb): (hb.c:4108) On fd 327 recv peek error Jun 08 2017 04:02:03 GMT: WARNING (hb): (hb.c:4108) On fd 327 recv peek error

the log become very huge.

And if I restart one node, the new started node seems okay, it will report warnings not so frequently, then work fine like following,

Jun 08 2017 04:04:30 GMT: INFO (info): (ticker.c:433) {production} device-usage: used-bytes 853415244160 avail-pct 60 cache-read-pct 0.00 Jun 08 2017 04:04:31 GMT: WARNING (cf:socket): (socket.c:526) Error while connecting FD 340: 111 (Connection refused) Jun 08 2017 04:04:31 GMT: WARNING (cf:socket): (socket.c:579) Error while connecting socket to 10.16.100.3:3002

Although three nodes are alive, but the client report connection refuse error.

Is it normal behavior ? The mesh mode support dynamically remove node ?


#2

There was a hb crash addressed in the following release, may be related.

http://www.aerospike.com/download/server/notes.html#3.10.0.3