Issues with restarting an instance?

I have a very basic cluster:

something similar to that of

version: '3'
services:
  aerospike:
    image: aerospike:4.6.0.2
    network_mode: "host"
    command: ["--config-file" ,"/opt/aerospike/etc/aerospike.conf"]
    volumes:
      - ./data:/opt/aerospike/data
      - ./etc:/opt/aerospike/etc
    ulimits:
      nofile:
        soft: "65536"
        hard: "65536"
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: "10"

Which is running on two separate servers. If run docker-compose down --rmi all and start the clusters they seem to work and join however are somehow forever doing migrations (with no data?).

My problems are:

  • Somehow I can start a cluster with size 0 ?
  • Good chance of seeing the following being spammed in the logs: WARNING (hardware): (hardware.c:2262) failed to resolve mounted device /dev/md1: 2 (No such file or directory)
  • Restarting a single instance essentially breaks the cluster and no longer rejoins citing:

skipping forming cluster - cannot form new cluster from pending join requests (empty)

or

join request timed out for principal bb96c64c0902500

or

ctrl ack (14): unexpected source bb96c64c0902500

The only thing which resolves this is completely deleting the image (so far) and trying again - which is not production friendly.

The other issue I see is that migrations take quite literally forever - even though there is no data in the cluster at all.

The servers are running SSDs, my configuration is pretty much default (like this aerospike-server/aerospike.conf at master · aerospike/aerospike-server · GitHub)

What am I missing?

If you are using network_mode of host, what is the config for each of the aerospike nodes in the cluster. You may have to configure the service, heartbeat and fabric address to use the right interface to communicate with the other server. Please see:

Thank you, I did not explicitly set those and it seems that it functions alright now.

The only issue I am concerned about is the following:

WARNING (hardware): (hardware.c:2262) failed to resolve mounted device /dev/md1: 2 (No such file or directory)

it doesnt seem to effect anything?

I don’t see ‘/dev/md1’ in the configuration file you linked to, did you update the configuration file? If so could you share the updates?

I never specified any device except for:

storage-engine device {
        file /opt/aerospike/data/data.dat
        filesize 25G
        data-in-memory true
}

Which is inside the docker container, the underlying device however is /dev/md1

Ok, from within the container, are you able to access /dev/md1? I suspect not, this seems like an issue with hardware.c and docker containers. Out of curiosity, are your running the container with --privileged, if not could you check if the issue behaves the same with this flag - this may explain why we haven’t seen this internally.

We collect stats on the device to monitor the devices health, this is warning that we are unable to resolve this device. It should be benign to Aerospike functionality. I’ll discuss this with the maintainers of hardware.c.

I unfortunately cant run with privileged, so that would confirm that I am not running it in that manner.

Ewan, could you run the following two commands inside the container and share their output?

  • cat /proc/mounts
  • ls -l /dev

It does seem like /proc/mounts properly indicates that /opt/aerospike/data resides on /dev/md1. But it also seems like /dev does not contain a device node for md1. Hence I’d like to double-check the contents of /proc/mounts as well as what’s in /dev inside the container.