I ran the benchmarks when aerospike was configured on a single server, no replication, and achieved excellent transactions per second results. When I configured a new namespace with replication factor of 2, and added 11 servers across 3 datacenters, the transactions per second of the benchmark drops to almost 0.
I easily get 100Mbps between each of the nodes, and there’s no packet loss, and I’ve verified there are no firewall rules affecting connectivity, so I believe the problem is isolated to either my configuration or how aerospike is handling writes/reads.
Here’s some of my benchmark logs:
> 2015-01-15 14:56:14 INFO write(tps=43 timeouts=0 errors=0) read(tps=33 timeouts=7 errors=0) total(tps=76 timeouts=7 errors=0)
> 2015-01-15 14:56:15 INFO write(tps=9 timeouts=0 errors=0) read(tps=14 timeouts=3 errors=0) total(tps=23 timeouts=3 errors=0)
> 2015-01-15 14:56:16 INFO write(tps=8 timeouts=0 errors=0) read(tps=20 timeouts=1 errors=0) total(tps=28 timeouts=1 errors=0)
> 2015-01-15 14:56:17 INFO write(tps=4 timeouts=0 errors=0) read(tps=4 timeouts=0 errors=0) total(tps=8 timeouts=0 errors=0)
> 2015-01-15 14:56:18 INFO write(tps=0 timeouts=0 errors=0) read(tps=0 timeouts=0 errors=0) total(tps=0 timeouts=0 errors=0)
> 2015-01-15 14:56:19 INFO write(tps=2 timeouts=0 errors=0) read(tps=2 timeouts=0 errors=0) total(tps=4 timeouts=0 errors=0)
> 2015-01-15 14:56:20 INFO write(tps=14 timeouts=0 errors=0) read(tps=14 timeouts=0 errors=0) total(tps=28 timeouts=0 errors=0)
Here’s my test namespace (I also tried using memory only, no difference):
namespace test {
replication-factor 2
memory-size 1G
storage-engine device {
file /opt/aerospike/data/test.data
filesize 2G
data-in-memory false
}
}
Any ideas what I could be doing wrong here?