Performance slowdown on a multi-node cluster with replication enabled

Hello,

We are running our application servers and database within docker/podman containers. A container hosts one server with its own DB and our tests run up to 4 servers at a time.

These servers accept tasks/jobs and we calculate the throughput of jobs from queue to finish. All the tasks/jobs are simple date commands with no cpu/memory footprints and our test is trying to capture how many such commands our server can execute per second. Servers read/write to DB when a task gets queued, run, finishes. No complex queries as well.

We are trying aerospike as distributed DB to see if it can fit into our HighAvailability requirement. The configuration we have is:

heartbeat {

    mode mesh
    port 3002 # Heartbeat port for this node.

    mesh-seed-address-port it03 3002
    mesh-seed-address-port it02 3002
    mesh-seed-address-port it01 3002
    mesh-seed-address-port it00 3002

    interval 150
    timeout 10

}

namespace pbs {

    replication-factor 2
    memory-size 4G

    storage-engine memory

}

When we hosted all 4 containers within a single machine (enough resources, for example, 88 cores 256GB RAM) we saw expected performance. But when we moved each container to the individual physical machine we saw a major performance drop.

Here are the sample numbers for reference:

All servers+containers on the same host/machine

1 server+container - 1144.84/s

2 servers+containers - 1950.19/s

3 servers+containers - 2831.50/s

4 servers+containers - 3691.01/s

1 host/machine per server.

1 server+container - 1181.28/s

2 servers+containers - 542.21/s

3 servers+containers - 719.38/s

4 servers+containers - 977.64/s

When I set replication=1, I saw the same numbers as one host multiple containers setup and performance comes down only with replication set > 1.

I am suspecting it might be a latency issue with containers running on different hosts. I tried to find any tweaks within the documentation that could help but I couldn’t.

Is there a way to improve read/write performance on a multi-host setup?

Thanks.

Hey @raoash, There are some good recommendations in this thread, which I’d recommend as a starting point: Write performance in multi-node clusters? - #11 by kporter

Please give the recommendations a try and come back to update what works for you?