New node not getting added in running cluster


#1

Hi

I am new to AeroSpike and trying to use it in my project.

My problem is that a new node is not getting added in the running cluster. I am using multicast and assuming that all I have to do is to run aerospike server on the new node with same multicast address/port(As mentioned here also). It shall then get picked up and added to the cluster. But it’s not happening like that.

I also tried to do “Add new node” through AMC, but got message that it’s not possible as this node doesn’t belong to same cluster. Not sure what to understand from that message.

Please provide some suggestion/document toward it. Thanks!


#2

Can you test if your network is passing multicast traffic? We have procedures using MTools here. If you can shut off the Aerospike servers and test with the same multicast address, if not test with the servers multicast address test with the last octet changed.

Are you using a cloud provider such as AWS? If so, most cloud hosts do not support multicast.

Do the nodes need to go through a router or a layer 3 switch?


#3

Hi @kporter Need your help here again :smile: . I was able to create multi node cluster on bare metals and using multicast. But I am now trying to do this on VMs and this time I have to use mesh style heartbeat. But cluster is not getting formed. Please see my config below and provide your suggestions. Thanks.

Config in VM1:

service {
        address any
        port 3000
        access-address  <ip-vm1>
        network-interface-name  lo
}

heartbeat {
        mode mesh
        address  <ip-vm1>
        port 3002
        mesh-seed-address-port <ip-vm2>  3002
        interval 250
        timeout 10
}

Config on VM2:

service {
        address any
        port 3000
        access-address  <ip-vm2>
        network-interface-name  lo
}

heartbeat {
        mode mesh
        address  <ip-vm2>
        port 3002
        mesh-seed-address-port <ip-vm1>  3002
        interval 250
        timeout 10
}

Please note that since this is VM, I don’t gave regular eth0 type interface name. I am using ‘lo’.

Output of ip link show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
        link/void

Kindly provide some pointers on this. Thanks!


#4

Hi @kporter

Just to provide more info regarding my problem, I am seeing below message in my log of AS1:

Apr 07 2015 10:18:40 GMT: INFO (hb): (hb.c::1042) initiated connection to mesh host at <ip-vm2>:3002 socket 66 from <ip-vm2:3002> 

This log line appear as soon as I start AS2 (and AS1 is already up). I also see zero foreign heartbeats in the log:

Apr 07 2015 10:18:43 GMT: INFO (info): (thr_info.c::4614)    heartbeat_received: self 25 : foreign 0
Apr 07 2015 10:19:03 GMT: INFO (info): (thr_info.c::4615)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0

Kindly provide any pointers :smile:

If I can provide any other info from log/config that could be helpful, please let me know. Thanks a lot!


#5

Our Node IDs are a function of the heartbeat port and the MAC address of the interface defined by network-interface-name which you have unfortunately configured to loobpack on both.

The preferred solution is to set network-interface-name to an interface which has a unique MAC address within the cluster.

Alternatively, (and a bit of a hack, stop reading) you could use rack-aware with and configure a single rack. With rack-aware you can configure your own node ids. But you will also get some warnings about not having any other racks. Guess you could get around those warnings if each node were to have a unique rack and node id.

Hope this helps.


#6

Hi @kporter

Thanks a lot for your reply. I had actually tried with both the options (‘lo’ and ‘venet0’). My bad, I should have put that information here. WIth ‘venet0’ as well, my clustering doesn’t happen.

Thanks for mentioning the alternative as well. But I am not using ‘rack-aware’ option .

Any other pointers you could provide ?


#7

The self counter should not be incrementing in a mesh configuration, we only expect that to increment when using multicast. I suspect the reason for that in this case was that this was from the run that had both nodes using the same MAC address. Could you provide those log lines while running with venet0?

Also could you provide the output of the following on both nodes:

asinfo -v "node"

#8

Hi @kporter

With venet0, I still see similar message:

Apr 08 2015 08:36:29 GMT: INFO (info): (thr_info.c::4614)    heartbeat_received: self 243 : foreign 0
Apr 08 2015 08:36:29 GMT: INFO (info): (thr_info.c::4615)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 

Output on AS1:

/opt/aerospike/bin/asinfo -v "node"
BB9000000000000

Output on AS2:

/opt/aerospike/bin/asinfo -v "node"
BB9000000000000

As of now, both ids are same. I guess they should be different when they are in same cluster? Thanks.


#9

Hi @kporter

More logs line in case they could lead to any clue :smile:

In VM1:

Apr 08 2015 12:22:24 GMT: INFO (hb): (hb.c::1961) connecting to remote heartbeat service at <ip-vm2>

Similar line in VM2 as well.

I am also able to telnet each server from another on 3002 port.

Could it be because of the known AS issue mentioned here: Mesh heartbeat - "unable to parse heartbeat message" [Released] [Resolved] ?

I am really sorry to bug you for such a small issue. But I am kind of stuck after having tried out many options and still not getting my head around this. :smile: In case you can make any other conclusions with the limited information you have on my scenario, kindly let me know. Thanks.


#10

Hi @kporter

An update. I was earlier on debian6. Upgrading to debian 7 has fixed my problem. Clustering is happening properly there. But I am not sure if the issue is in AS’s version of debian6 or in debian6 itself. Thanks for your pointers.


#11

Glad you solved the issue, the problem was that the node ids were the same which must mean that venet1’s MAC address was all 0s.

The node id can be read as follows:

<hex hb port><mac address reversed>

#12

Hi @kporter

Looks like some lines in your previous answer got missed. Can you please provide them? Just for future reference :smile:

And I think you are right regarding MAC address. This my output:

ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/void <MAC should have been here i guess. Can you confirm pls ?>
    inet 127.0.0.2/32 scope host venet0
    inet <ip-vm1>/32 scope global venet0:0

Can you confirm is this is a missing MAC for venet0 ? Thanks.


#13

I would expect a mac address there, but I am not sure what to make of the absence.