Were running successfull 2 Aerospike (within Docker) on the same Host (different ports --net=host). If we start a 3rd Aerospike node it does not start heres the Trace:
This is telling you that the access-address is not a real address on this machine. For docker you often need to specify this parameter with a virtual flag. Oddly I would expect the output to be āexternal address: ADDRESS is notā¦ā but somehow the address is null. Could you share you config?
PS: Now two of our docker Containers crashed (SIGABRT?), and it seems they are unstartable again. (Same stacktrace). Sadly we have no logs for them because there were not mapped to the hostā¦
Were running on OVH and we read / thought they use special network stuff to allow assigning all ips of a Block. (Including Network-Address and Broadcast-Address). As long as you use the subnetmask 255.255.255.255 as we did.
Maybe we understand wrong, maybe its bugged. Anyways this seems to be the issue which raised the stacktrace.
Well investigate further and keep you updated.
Problem was solved using virtual with access-address. Sadly we somehow skipped this hint on the second post. Thank you for this! We still donāt get why this was causing issues even on the main host and why this problem was coming out of the nothing. Suggestions: maybe its helpful to add a is null check and raise a message instead of just crashing the daemon?
Its now working stable again all we did was adding āvirtualā to āaccess-address NODE_IPā. I was thinking about further what we did to our servers. The only thing which was getting in my mind was that we migrated IPs from another server to our aerospike servers. During this step the network was reinitialized, maybe even the routers on our Hoster? After the network was stablized and even the physical servers were rebootet (just to be sure) the issue came out of the nothing. I double checked that our aero config was not changed, were using version control for our server configs and it was confirming this situation. Maybe something went wrong during the IP-migration? Maybe it was wrong the whole time and due to the IP-migration our Hosters routers were reinitalized and āvirtualā was neccesary? The strange thing about it was that is was affecting both of our servers were runningā¦ Sorry i dont have any clue maybe youre able to figure something outā¦ Heres our config:
# Aerospike database configuration file for deployments using XDR.
service {
user root
group root
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 1024
migrate-xmit-hwm 200
migrate-threads 8
scan-priority 2000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
file /var/log/aerospike/aerospike-crit.log {
context any critical
}
file /var/log/aerospike/aerospike-warn.log {
context any warning
}
console {
context any info
}
}
network {
service {
address any
port 3000
# Uncomment the following to set the `access-address` parameter to the
# IP address of the Docker host. This will the allow the server to correctly
# publish the address which applications and other nodes in the cluster to
# use when addressing this node.
access-address NODE_IP virtual
}
heartbeat {
mode mesh # Send heartbeats using Mesh (Unicast) protocol
address any # IP of the NIC on which this node is listening
# to heartbeat
port 3002 # port on which this node is listening to
# heartbeat
mesh-seed-address-port NODE1_IP 3002 # IP address for seed node in the cluster
mesh-seed-address-port NODE2_IP 3002
interval 150 # Number of milliseconds between heartbeats
timeout 20 # Number of heartbeat intervals to wait before
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace ns1 {
# enable-xdr true # Enable replication for this namespace.
# xdr-remote-datacenter REMOTE_DC_2
replication-factor 2
memory-size 20G
default-ttl 0
single-bin true
storage-engine device {
file /opt/aerospike/data/ns1.dat
data-in-memory false # Store data in memory in addition to file.
filesize 390G
}
}
namespace ns2 {
replication-factor 2
memory-size 1G
default-ttl 0
single-bin true
storage-engine device {
file /opt/aerospike/data/ns2.dat
filesize 10G
cold-start-empty true
}
}
namespace ns3 {
# enable-xdr true # Enable replication for this namespace.
# xdr-remote-datacenter REMOTE_DC_2
replication-factor 2
memory-size 2G
default-ttl 0
single-bin true
storage-engine device {
file /opt/aerospike/data/ns3.dat
data-in-memory true # Store data in memory in addition to file.
filesize 10G
#cold-start-empty true
}
}
Absolutely! We used the same configuration as we used in Docker. (Were using --net=host). On all participant of the cluster. The only thing what was replaced was the IPs of the nodes.
Would it be possible to get the network portion of the aerospike.conf for each of the docker containers. Feel free to mask the IPs by replacing the first 3 octets with an X (ie: X.X.X.Y)
You also should be able to add all 3 of the IPs as seed nodes to each of the config files:
mesh-seed-address-port NODE1_IP 3002 # IP address for seed node in the cluster
mesh-seed-address-port NODE2_IP 3002
mesh-seed-address-port NODE3_IP 3002
Iām also assuming that the node NODE1_IP,NODE2_IP,NODE3_IP are the IPs used in the 3 different access-address virtual entries. Are all the mesh seed port using 3002? or do you have some using other ports?
To check the IPs published by each nodes of the cluster you could run the aerospike tools docker container
I guess the reason why your tools container is not able to connect is
related that its not getting -net=host passed. Anyways were running
all containers with --net=host ~
Update: We figured out that there was a missing iptables rules for running docker containers without --net=host. Anyways this should not be related to the virtual issue since we always run our containters with --net=host. Do you think further investigate is necessary? I mean our Enviroment stablized after adding virtual. And if you add a == null check (maybe with a critical log?) then aerospike wont ājustā crash in the future. Furthermore now its even possible to run the 3rd Docker-Container with aerospike on it ~
Good news! Weāve released Aerospike 3.6.0, which features a number of improvements to batch-read, scan, etc., as well as numerous fixes, including AER-3946.
You can read more about this release on our Aerospike Server CE 3.6.0 release notes and dowload it here.
Please upgrade and let us know whether you still encounter your issue.