Node disconnected from cluster


#1

Hi Team,

We are using C-3.15.1.3. We find that all of a sudden 2 nodes disconnected and then joined back after sometime. And found that one node completely went down and I had to start the aerospike process. Below is the first log entry I could see. Any idea what had happened here?

Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:156)  (04: 0000544342) (05: 0000050542) (06: 0000000229) (07: 0000000131)
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:156)  (08: 0000000270) (09: 0000000304) (10: 0000000224) (11: 0000000004)
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:371) {offerengine} objects: all 80397355 master 41234998 prole 39162357 non-replica 0
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:416) {offerengine} migrations: complete
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:435) {offerengine} memory-usage: total-bytes 12735868772 index-bytes 5145430720 sindex-bytes 0 data-bytes 7590438052 used-pct 42.36
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:465) {offerengine} device-usage: used-bytes 25212295296 avail-pct 70
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:534) {offerengine} client: tsvc (0,51) proxy (1066,0,6) read (5394168,0,0,49760022) write (260822523,26345306,2386) delete (38800656,0,444,640) udf (0,0,0) lang (0,0,0,0)
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:562) {offerengine} batch-sub: tsvc (0,0) proxy (2816,0,374) read (1027768449,0,0,22191408446)
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:590) {offerengine} scan: basic (924288,2028,0) aggr (0,0,0) udf-bg (0,0,0)
Aug 16 2018 18:31:26 GMT: INFO (info): (ticker.c:684) {offerengine} retransmits: migration 2064408 client-read 0 client-write (0,6) client-delete (0,0) client-udf (0,0) batch-sub 0 udf-sub (0,0) nsup 0
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:139) histogram dump: {offerengine}-read (55154190 total) msec
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:156)  (00: 0055146373) (01: 0000004595) (02: 0000002730) (03: 0000000452)
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:165)  (04: 0000000039) (05: 0000000001)
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:139) histogram dump: {offerengine}-write (287167829 total) msec
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:156)  (00: 0260428187) (01: 0025282620) (02: 0001051125) (03: 0000244109)
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:156)  (04: 0000076177) (05: 0000051441) (06: 0000010722) (07: 0000007310)
Aug 16 2018 18:31:26 GMT: INFO (info): (hist.c:165)  (08: 0000011885) (09: 0000004140) (10: 0000000113)
Aug 16 2018 18:31:29 GMT: INFO (clustering): (clustering.c:6868) ignoring join request from node bb99ec254005452 since a request is already pending
Aug 16 2018 18:31:29 GMT: WARNING (socket): (socket.c:740) Timeout while connecting
Aug 16 2018 18:31:29 GMT: WARNING (socket): (socket.c:808) Error while connecting socket to 10.84.194.190:3002
Aug 16 2018 18:31:29 GMT: WARNING (hb): (hb.c:4669) could not create heartbeat connection to node {10.84.194.190:3002}
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:3033) faulty check skipped - found obsolete plugin data for node bb9bfc254005452
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:3033) faulty check skipped - found obsolete plugin data for node bb9bec254005452
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:3033) faulty check skipped - found obsolete plugin data for node bb9abc254005452
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:7772) join requests at quantum start: bb99ec254005452
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:4323) paxos round started - cluster key: e5610f45466c
Aug 16 2018 18:31:30 GMT: INFO (clustering): (clustering.c:7772) paxos round started - succession list: bb9d1c254005452 bb9c0c254005452 bb9bfc254005452 bb9bec254005452 bb9abc254005452 bb99ec254005452 bb943c454005452 bb93fc454005452 bb905c454005452
Aug 16 2018 18:31:31 GMT: INFO (clustering): (clustering.c:5485) applied new cluster key e5610f45466c
Aug 16 2018 18:31:31 GMT: INFO (clustering): (clustering.c:7772) applied new succession list bb9d1c254005452 bb9c0c254005452 bb9bfc254005452 bb9bec254005452 bb9abc254005452 bb99ec254005452 bb943c454005452 bb93fc454005452 bb905c454005452
Aug 16 2018 18:31:31 GMT: INFO (clustering): (clustering.c:5489) applied cluster size 9
Aug 16 2018 18:31:33 GMT: INFO (clustering): (clustering.c:2970) orphan check skipped - found obsolete plugin data for node bb9bec254005452
Aug 16 2018 18:31:33 GMT: INFO (clustering): (clustering.c:6860) ignoring join request from node bb9bec254005452 since it is already part of the cluster
Aug 16 2018 18:31:34 GMT: INFO (hb): (hb.c:8254) node expired bb9bec254005452
Aug 16 2018 18:31:34 GMT: INFO (fabric): (fabric.c:2472) fabric: node bb9bec254005452 departed
Aug 16 2018 18:31:34 GMT: INFO (fabric): (fabric.c:925) fabric_node_disconnect(bb9bec254005452)
Aug 16 2018 18:31:34 GMT: INFO (exchange): (exchange.c:1967) aborting partition exchange with cluster key 584a0844ea36
Aug 16 2018 18:31:34 GMT: INFO (exchange): (exchange.c:1976) data exchange started with cluster key e5610f45466c
Aug 16 2018 18:31:34 GMT: WARNING (exchange): (clustering.c:7772) error sending exchange data bb93fc454005452 bb9d1c254005452 bb905c454005452 bb9bec254005452 bb99ec254005452 bb9c0c254005452 bb943c454005452 bb9bfc254005452 bb9abc254005452
Aug 16 2018 18:31:35 GMT: WARNING (exchange): (clustering.c:7772) error sending exchange data bb9bec254005452 bb99ec254005452 bb9c0c254005452 bb943c454005452 bb9bfc254005452 bb9abc254005452
Aug