Issues with restarting an instance?

Ewan_Walker · September 24, 2019, 7:15pm

I have a very basic cluster:

something similar to that of

version: '3'
services:
  aerospike:
    image: aerospike:4.6.0.2
    network_mode: "host"
    command: ["--config-file" ,"/opt/aerospike/etc/aerospike.conf"]
    volumes:
      - ./data:/opt/aerospike/data
      - ./etc:/opt/aerospike/etc
    ulimits:
      nofile:
        soft: "65536"
        hard: "65536"
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: "10"

Which is running on two separate servers. If run docker-compose down --rmi all and start the clusters they seem to work and join however are somehow forever doing migrations (with no data?).

My problems are:

Somehow I can start a cluster with size 0 ?
Good chance of seeing the following being spammed in the logs: WARNING (hardware): (hardware.c:2262) failed to resolve mounted device /dev/md1: 2 (No such file or directory)
Restarting a single instance essentially breaks the cluster and no longer rejoins citing:

skipping forming cluster - cannot form new cluster from pending join requests (empty)

or

join request timed out for principal bb96c64c0902500

or

ctrl ack (14): unexpected source bb96c64c0902500

The only thing which resolves this is completely deleting the image (so far) and trying again - which is not production friendly.

The other issue I see is that migrations take quite literally forever - even though there is no data in the cluster at all.

The servers are running SSDs, my configuration is pretty much default (like this aerospike-server/aerospike.conf at master · aerospike/aerospike-server · GitHub)

What am I missing?

lucien · September 24, 2019, 7:59pm

If you are using network_mode of host, what is the config for each of the aerospike nodes in the cluster. You may have to configure the service, heartbeat and fabric address to use the right interface to communicate with the other server. Please see:

Ewan_Walker · September 25, 2019, 12:37pm

Thank you, I did not explicitly set those and it seems that it functions alright now.

The only issue I am concerned about is the following:

WARNING (hardware): (hardware.c:2262) failed to resolve mounted device /dev/md1: 2 (No such file or directory)

it doesnt seem to effect anything?

kporter · September 25, 2019, 6:40pm

I don’t see ‘/dev/md1’ in the configuration file you linked to, did you update the configuration file? If so could you share the updates?

Ewan_Walker · September 25, 2019, 6:44pm

I never specified any device except for:

storage-engine device {
        file /opt/aerospike/data/data.dat
        filesize 25G
        data-in-memory true
}

Which is inside the docker container, the underlying device however is /dev/md1

kporter · September 25, 2019, 7:03pm

Ok, from within the container, are you able to access /dev/md1? I suspect not, this seems like an issue with hardware.c and docker containers. Out of curiosity, are your running the container with --privileged, if not could you check if the issue behaves the same with this flag - this may explain why we haven’t seen this internally.

We collect stats on the device to monitor the devices health, this is warning that we are unable to resolve this device. It should be benign to Aerospike functionality. I’ll discuss this with the maintainers of hardware.c.

Ewan_Walker · September 25, 2019, 7:29pm

I unfortunately cant run with privileged, so that would confirm that I am not running it in that manner.

tlo · September 26, 2019, 2:43pm

Ewan, could you run the following two commands inside the container and share their output?

cat /proc/mounts
ls -l /dev

It does seem like /proc/mounts properly indicates that /opt/aerospike/data resides on /dev/md1. But it also seems like /dev does not contain a device node for md1. Hence I’d like to double-check the contents of /proc/mounts as well as what’s in /dev inside the container.

Topic		Replies	Views
Cluster became broken Containers (Docker, etc.)	2	2811	December 2, 2016
2 aerospike Docker containers not able to form a cluster Containers (Docker, etc.) container , docker	8	4219	April 24, 2017
Docker Aerospike Cluster Configuration docker	0	953	March 22, 2019
Aerospike docker container 3.6.3 is not forming a cluster Configuration docker	6	3098	May 5, 2017
Mesh Cluster using Docker: Unable to form 2-node cluster Installation	5	1187	December 11, 2019

Issues with restarting an instance?

Related topics