I install a cluster with Docker swarm.
First i load with 2 db and next i test with a sclate to 8 db.
With 2 db, load generate errors quickly and stop because all db are full.
With 8 db, load also generate errors but only when data flow is sent to db full (1 or 2).
My data is a file of 50MB.
With Asadm i can get stop writes error at true with Available 1 or 0 % with 15-20MB used on disk.
At the load beginning, Available was at 99%.
I used params aerospike indicated in this blog
Docker machine create per default a node with 1GB of ram, i tried to increase to 1,5GB but there is no effects.
I have 6GB of free RAM (10 are used) and 400GB free on my SSD.
How i can allow my system to load correctly data ?
i’m using the last version of server present on docker hub : 4.2.0.5
You will find bellow the content of file .conf upload by yml on all containers :
# Aerospike database configuration file.
# This stanza must come first.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
# Send log messages to stdout
console {
context any info
}
}
network {
service {
address eth0
port 3000
# Uncomment the following to set the `access-address` parameter to the
# IP address of the Docker host. This will the allow the server to correctly
# publish the address which applications and other nodes in the cluster to
# use when addressing this node.
# access-address <IPADDR>
}
heartbeat {
address eth0
# mesh is used for environments that do not support multicast
mode mesh
port 3002
# use asinfo -v 'tip:host=<ADDR>;port=3002' to inform cluster of
# other mesh nodes
interval 150
timeout 10
}
fabric {
address eth0
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 2
memory-size 1G
default-ttl 5d # 5 days, use 0 to never expire/evict.
# storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
storage-engine device {
file /opt/aerospike/data/test.dat
filesize 4G
data-in-memory true # Store data in memory in addition to file.
}
}
namespace nfe204 {
replication-factor 2
memory-size 1G
default-ttl 5d # 5 days, use 0 to never expire/evict.
# storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
storage-engine device {
file /opt/aerospike/data/nfe204.dat
filesize 4G
data-in-memory true # Store data in memory in addition to file.
}
}
But if i access to my container, file /etc/aerospike.conf is not present.
I can find file in /etc/aerospike/aerospike.conf .
But the content seem like to be the default conf file (new namespace is not present) but with asadm he is listed.
If you followed the blog post, your conf file exists as a Docker secret at /run/secrets/aerospike.conf and upon Aerospike container start, we utilize the --config-file param to utilize this secret.
Stop-writes can occur when device_available_pct drops below min-avail-pct, or when memory is over 90% used.
Can you please post the output from asadm -e info?
At the surface, it looks like defragmentation is unable to keep up.
If you’ve also used this loader to load in the data, then you should not have defragmentation at all, as all data are written sequentially (no updates/deletes).
We can find out more with some log lines. About 50 or so of the last lines from the log is enough. Either /var/log/aerospike/aerospike.log inside of the container, or docker logs --tail 50 {aerospike_container}
Lastly, are all 4 docker nodes/virtual machines on a single machine? Like your laptop?
From the write histogram you provided, writes seem to be taking a long time to complete. Not sure if it is your disk or not, could you configure enable-benchmarks-storage to true and rerun this test. More then 10% took over 32 ms and some even took more than 1 second.
Also defrag is unable to keep up which would also indicate that the disk is not able to handle the load.
Why are there so many proxies (proxy (334114,0,4))? Proxies should be relatively rare, with spike around cluster disruptions. If you are proxying all the time then a client may not be able to see all nodes.
Why is this node having trouble connecting to the other nodes?
aerospike_aerospikedb.1.sgzk1ai0f95b@nodeleader-1 | Aug 01 2018 21:17:18 GMT: WARNING (socket): (socket.c:746) (repeated:65) Timeout while connecting
Are there supposed to be 4 nodes in this cluster? Only two of them are forming a cluster.
Docker and friends add bring a lot of complexity into the system, I would suggest trying to get things running without the extra complexity first. Also have you verified that your disks can handle to load you using ACT?
There’s 2 container’s worth of logs, 15 minutes apart, interwoven. It seems like you’ve deleted the entire cluster and recreated it, which is why they’re both aerospike_aerospikedb.1, but the next string is different, rsby... vs sgzk....
Proxies/Load Balancers should not be used with Aerospike. Unless you’re using the very latest Java and C clients and even then, it’s only useful for discovery, not actual traffic routing.
So if you’re deploying Aerospike onto containers (Docker, Kubernetes, Mesos/Marathon, etc…), you’d either need to use host-based networking, or place your clients inside of the same container environment.
I think i have error to connect because each time i reboot my destop, services are randomly redispatch between nodes
If everything’s on a single machine, then all containers will be placed in a stopped state, able to be and are resumed after reboot. If your Docker Swarm is across multiple machines, then the containers that were brought down with the host will be removed and new replacements created on the surviving host(s).
I use Haproxy to interact between my os and Docker
Aerospike does not work correctly with proxies/load balancers. Each Aerospike Client keeps track of which server node to place each record. Clients connect directly to the server’s published address. By forcing a proxied route, client connections cannot reach server nodes directly, and must timeout and attempt the proxy. The proxy will then forward the connection to a random server node, which the server most likely then has to forward to the correct server node.
You’re probably seeing loader write files due to the above process taking too long and timing out.
So then how can you communicate with Aerospike inside of Docker?
By placing your clients inside of the same Docker environment. Build a simple container with your compiled client code and run it:
Also you’re changing your cluster each time, going from 2 nodes, to 4 nodes, to 7 nodes. Please stop doing that. If your dataset is only 50MB, then stick with 2 nodes and in-memory only. Once you’ve figured out how networking should be done, then add persistent storage.