Heartbeat number of connections per node

evinet · May 13, 2015, 9:11am

Hy, I’ve a 4 nodes cluster 3.5.8 CE version running only one namespace.

Analyzing heartbeat info logs, I extract the “trans_in_progress” logs to see hb connections.

Server 1:

trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (27, 790021, 789994) : hb (5, 464, 459) : fab (58, 128, 70)

Server 2:

trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (22, 754969, 754947) : hb (4, 96, 92) : fab (58, 116, 58)

Server 3:

trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (20, 758292, 758272) : hb (6, 552, 546) : fab (58, 160, 102)

Server 4:

trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (16, 664968, 664952) : hb (5, 16, 11) : fab (58, 114, 56)

Is it normal that the number of heartbeat connexions is not the same everywhere? Should it not be 6 (2 directions for each other node)?

I’m surprised to have different values in each logs. Does it mean I have some network problems ?

Thanks a lot.

Emmanuel

evinet · May 13, 2015, 2:16pm

When looking into logs with detail level activated, I see that several connexions are opened on the same nodes:

May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_rx_process:1716) Got heartbeat pulse from node identifying itself as 10.240.12.31:3002
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_rx_process:1716) Got heartbeat pulse from node identifying itself as 10.240.12.31:3002
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_rx_process:1716) Got heartbeat pulse from node identifying itself as 10.240.118.17:3002
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_rx_process:1716) Got heartbeat pulse from node identifying itself as 10.240.226.153:3002
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_rx_process:1716) Got heartbeat pulse from node identifying itself as 10.240.226.153:3002
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_thr:2106) sending tcp heartbeat to index 97 : msg size 339
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_thr:2106) sending tcp heartbeat to index 104 : msg size 339
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_thr:2106) sending tcp heartbeat to index 116 : msg size 339
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_thr:2106) sending tcp heartbeat to index 131 : msg size 339
May 13 2015 13:43:37 GMT: DETAIL (hb): (hb.c:as_hb_thr:2106) sending tcp heartbeat to index 150 : msg size 339

Is it normal ?

I detect an other problem with the debug level:

May 13 2015 13:43:59 GMT: DEBUG (hb): (hb.c:as_hb_try_connecting_remote:1028) could not create heartbeat connection to node 10.240.112.35:3002

Heartbeat tried to connect a dead node from more than 1 week. The cluster have been fully restarted after this node death because it has been replaced with a new instance on GCE with local-SSD and it’s ip address has changed.

Thanks

kporter · May 13, 2015, 3:50pm

The ideal number of sockets on a given node is the cluster size -1. But there is a benign issue where 2 sockets may exist to some nodes, which you are seeing.

As for the connection to the dead node, is or was the old IP still in the aerospike.conf at the time the server started? The servers periodically check if the servers defined there have returned. To clear that issue you should only need to restart each node after the node has been removed from the configs.

evinet · May 15, 2015, 5:04am

Ok, thanks a lot for your reply. Emmanuel

Topic		Replies	Views
Received same pulse from other fd, surprising Monitoring	1	1453	May 7, 2015
Unbalanced number of connections on nodes Configuration	2	827	April 27, 2020
Cluster nodes connections unbalanced Tuning	1	1250	February 26, 2018
heartbeat_stats bf value increase Monitoring	3	1786	April 1, 2015
Heartbeat expiring caused a "Single Node Cluster" migration	0	2124	April 30, 2016

Heartbeat number of connections per node

Related topics