Mesh Cluster using Docker: Unable to form 2-node cluster

Hello

I’m trying out the aerospike docker image (aerospike/aerospike-server), and I’m unable to form a simple 2-node cluster.

One node is hosted on a Ubuntu physical machine while the other node is hosted on a Windows 10 physical machine:

Ubuntu host: 172.16.1.82 (ports 3000-3003)
Windows host: 172.16.1.63 (ports 3010-3013)

Both of them are using the config file below:

network {
  service {
    address any                        
    port 3000                      # 3010 in Windows cfg file                     
    access-address 172.16.1.82     # 172.16.1.63 in Windows cfg file
  }

  fabric {
    address any
    port 3001      # 3011 in Windows cfg file
  }

  info {
    address any
    port 3003      # 3013 in Windows cfg file
  }

  heartbeat {
    mode mesh  
    address any
    port 3002    # 3012 in Windows cfg file 
    mesh-seed-address-port 172.16.1.82 3002    
    mesh-seed-address-port 172.16.1.63 3012   
    interval 150
    timeout 10
  }
}

namespace test {
  memory-size 1G
  replication-factor 2
  storage-engine memory
}

Launch command that I’m using:

// Replace ports with 3010 to 3013 if on Windows machine
docker run --name aerospike \
  -p 3000:3000 -p 3001:3001 -p 3002:3002 -p 3003:3003 \
  -v $NODE_CFG_DIR:/opt/aerospike/etc \
  -v $NODE_DATA_DIR:/opt/aerospike/data \
  aerospike/aerospike-server \
  asd --foreground \
  --config-file /opt/aerospike/etc/aerospike.conf

If I launch only a single node, things are working fine; I can connect to the node via aql and asadm. I can even connect to the other node using aql from the another machine.

Here’s the output for asadm -e info for the single node on Ubuntu; it looks ok:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-12-02 00:35:44 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Node               Node                 Ip       Build   Cluster   Migrations        Cluster     Cluster         Principal   Client     Uptime
               .                 Id                  .           .      Size            .            Key   Integrity                 .    Conns          .
nox-MS-7673:3000   *BB9020011AC4202   172.16.1.82:3000   C-4.7.0.5         1      0.000     BE28EEDF898B   True        BB9020011AC4202        1   00:00:45
Number of rows: 1

In the 2-node case, I’ll launch one node (say on the Ubuntu machine) first, and subsequently the second node on the other machine.

What happens is that the first node launched will still believe it’s in a cluster size of 1 according to asadm -e info. However, the second node will actually go from cluster size 1 to 0:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-12-02 00:42:37 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
            Node               Node                 Ip       Build   Cluster   Migrations   Cluster     Cluster         Principal   Client     Uptime 
               .                 Id                  .           .      Size            .       Key   Integrity                 .    Conns          . 
172.16.1.63:3010   *BC3020011AC4202   172.16.1.63:3010   C-4.7.0.5         0      0.000           0   False       BC3020011AC4202        3   00:01:00 
Number of rows: 1                                                                                                                                     

The thing is, the two seems to be communicating at least. If I ctrl+C any of the two nodes running, the other node will report a could not create heartbeat connection to node error message, which indicates that there was a heartbeat connection established prior between the two machines.

Any ideas that could help resolve this problem? There isn’t any other output messages being reported by the nodes.

thanks!

Hi,

I have two node Aerospike instance on dockers in GCP. They are on Google Dataproc and this is the configuration for the seed node. i.e. IP address 10.154.0.23

network {
    service {
        address 10.154.0.23
        port 3000

        # Uncomment the following to set the `access-address` parameter to the
        # IP address of the Docker host. This will allow the server to correctly
        # publish the address which applications and other nodes in the cluster to
        # use when addressing this node.
        access-address 10.154.0.23

    }

    heartbeat {

        # mesh is used for environments that do not support multicast
        mode mesh
        ##address 10.154.0.23       # (Optional) (Default: any) IP of the NIC on

        port 3002

        mesh-seed-address-port 10.154.0.23  3002 # IP address for seed node in the cluster
                                                          # This IP happens to be the local node
        mesh-seed-address-port 10.154.0.21 3002 # IP address for seed node in the cluster
                                                          #
        interval 150
        timeout 10
    }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

And this is the second node 10.154.0.21

network {
    service {
        address 10.154.0.21
        port 3000

        # Uncomment the following to set the `access-address` parameter to the
        # IP address of the Docker host. This will allow the server to correctly
        # publish the address which applications and other nodes in the cluster to
        # use when addressing this node.
        access-address 10.154.0.21
    }

    heartbeat {

        # mesh is used for environments that do not support multicast
        mode mesh
        ##address 10.154.0.21       # (Optional) (Default: any) IP of the NIC on

        port 3002

        mesh-seed-address-port 10.154.0.23  3002 # IP address for seed node in the cluster
        mesh-seed-address-port 10.154.0.21 3002 # IP address for this node in the cluster
                                                          #
        interval 150
        timeout 10
    }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

It works fine. Note that in mind I have --> address 10.154.0.23 for the first node and address 10.154.0.21 for the second but yours is address any. Also note --> mesh-seed-address-port ordering

HTH

Since you have two physical host, I believe you would have to use a docker overlay network to ensure proper communication between nodes. Another option is to use –network host and force the container to use the host network

Hm

This is assumed to be on Perm host. Mapping the container port to host port (3000) is perfectly valid, since there is really no reason for using the same port for anything else. Besides, having two Aerospike containers on the same physical host (both using port 3000) does not make sense.

docker run -tid --net=host …..

HTH

Thanks for the helpful replies, guys.

After spending a lot of time and trying out the suggestions, I gave up and ended up installing a Ubuntu dual boot partition on the Windows machine.

There’s a lot of unresolved issues and bugs with the Windows version of Docker. Particularly, --net=host is unsupported and initiating a swarm does not open the 2377 port according to netstat -an | grep LISTEN. For other people reading this, I recommend staying away from Docker Desktop for Windows.

To get the two nodes to connect (both nodes are running on two physical Ubuntu machines), I needed to use --net=host for the docker run command in order for the nodes to join as a cluster.

Without --net=host, a cluster doesn’t form and I’d get these warning messages from the aerospike logs:

Node 1: WARNING (hb): (hb.c:8452) ignoring delayed heartbeat - expected timestamp less than 103253265901486080 but was 103253265899388928 from node: bb9020011ac4202

Node 2: WARNING (hb): (hb.c:3734) found a socket 0x7fe6f7c287c0 without an associated channel

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.