Extremely low TPS in distributed configuration -> XDR


#1

I ran the benchmarks when aerospike was configured on a single server, no replication, and achieved excellent transactions per second results. When I configured a new namespace with replication factor of 2, and added 11 servers across 3 datacenters, the transactions per second of the benchmark drops to almost 0.

I easily get 100Mbps between each of the nodes, and there’s no packet loss, and I’ve verified there are no firewall rules affecting connectivity, so I believe the problem is isolated to either my configuration or how aerospike is handling writes/reads.

Here’s some of my benchmark logs:

>     2015-01-15 14:56:14 INFO write(tps=43 timeouts=0 errors=0) read(tps=33 timeouts=7 errors=0) total(tps=76 timeouts=7 errors=0)
>     2015-01-15 14:56:15 INFO write(tps=9 timeouts=0 errors=0) read(tps=14 timeouts=3 errors=0) total(tps=23 timeouts=3 errors=0)
>     2015-01-15 14:56:16 INFO write(tps=8 timeouts=0 errors=0) read(tps=20 timeouts=1 errors=0) total(tps=28 timeouts=1 errors=0)
>     2015-01-15 14:56:17 INFO write(tps=4 timeouts=0 errors=0) read(tps=4 timeouts=0 errors=0) total(tps=8 timeouts=0 errors=0)
>     2015-01-15 14:56:18 INFO write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
>     2015-01-15 14:56:19 INFO write(tps=2 timeouts=0 errors=0) read(tps=2 timeouts=0 errors=0) total(tps=4 timeouts=0 errors=0)
>     2015-01-15 14:56:20 INFO write(tps=14 timeouts=0 errors=0) read(tps=14 timeouts=0 errors=0) total(tps=28 timeouts=0 errors=0)

Here’s my test namespace (I also tried using memory only, no difference):

namespace test {
  replication-factor 2
  memory-size 1G
  storage-engine device {
    file /opt/aerospike/data/test.data
    filesize 2G
    data-in-memory false
  }
}

Any ideas what I could be doing wrong here?


#2

Hi Jon,

If I understand correctly, you have a single cluster that is spanning multiple data centers. This is not a normal topology for Aerospike. Aerospike Clusters are designed to operate in close proximity (sub-millisecond) to one another. Typically XDR is used when our clients need data replicated to other datacenters.

Your write latency is going to be low because each write will wait for a synchronous record replication to another node before the client receives a response. You can change to a asynchronous replication model by setting write-commit-level-override to master which should mitigate some of the issue. While the replication is happening the server will hold a row lock which will prevent reads or writes to that record until the replication has completed, so you would want to keep hot keys to a minimum. Also since replication will have more latency, you will may find in neccessary to raise the number of fabric-workers.


#3

Thanks for the info! I’ll switch over to XDR and see if that solves the issue for me. Still seems strange that transactions per second would drop this drastically tho.


#4

Transaction threads will block when performing the synchronous replication, which higher latency costs it becomes more likely that all transaction threads are blocked handling other requests. So you may be able to mitigate this effect by increasing transaction-threads-per-queue and/or transaction-queues.