Understanding asbenchmark parameters (10x performance difference)

szhem · June 3, 2019, 7:20pm

Hi,

Given:

Aerospike Community Edition 4.5.1.5

one node

namespace configuration

namespace test {
    replication-factor 1
    memory-size 24G
    default-ttl 30d
    storage-engine device {
        device /dev/sdd
        scheduler-mode noop
        write-block-size 128K
    }
}

asbenchmark 4.3.1, that has been

installed on the separate dedicated server

The strange issue is that when I’m running the benchmark with the following parameters

asbenchmark \
    --async \
    --bins 1 \
    --hosts 192.168.0.5 \
    --port 3000 \
    --namespace test \
    --set testset \
    --keys 10 \
    --keyType String \
    --keylength 24 \
    -netty \
    -nettyEpoll \
    -objectSpec B:1024 \
    --eventLoops 4 \
    --workload RR,100,0 \
    --connPoolsPerNode 100 \
    --asyncMaxCommands 1000 \
    --replica any

… I get the following output

2019-06-03 19:15:18.122 write(tps=0 timeouts=0 errors=0) read(tps=45007 timeouts=0 errors=0) total(tps=45007 timeouts=0 errors=0)
2019-06-03 19:15:19.122 write(tps=0 timeouts=0 errors=0) read(tps=45266 timeouts=0 errors=0) total(tps=45266 timeouts=0 errors=0)
2019-06-03 19:15:20.123 write(tps=0 timeouts=0 errors=0) read(tps=47231 timeouts=0 errors=0) total(tps=47231 timeouts=0 errors=0)
2019-06-03 19:15:21.123 write(tps=0 timeouts=0 errors=0) read(tps=45281 timeouts=0 errors=0) total(tps=45281 timeouts=0 errors=0)

If I change the number of keys to reasonable large value, e.g. 1000000, like the following

asbenchmark \
    --async \
    --bins 1 \
    --hosts 192.168.0.5 \
    --port 3000 \
    --namespace test \
    --set testset \
    --keys 1000000 \
    --keyType String \
    --keylength 24 \
    -netty \
    -nettyEpoll \
    -objectSpec B:1024 \
    --eventLoops 4 \
    --workload RR,100,0 \
    --connPoolsPerNode 100 \
    --asyncMaxCommands 1000 \
    --replica any

… I get the following output

2019-06-03 19:17:33.231 write(tps=0 timeouts=0 errors=0) read(tps=421654 timeouts=0 errors=0) total(tps=421654 timeouts=0 errors=0)
2019-06-03 19:17:34.231 write(tps=0 timeouts=0 errors=0) read(tps=445527 timeouts=0 errors=0) total(tps=445527 timeouts=0 errors=0)
2019-06-03 19:17:35.232 write(tps=0 timeouts=0 errors=0) read(tps=438792 timeouts=0 errors=0) total(tps=438792 timeouts=0 errors=0)
2019-06-03 19:17:36.232 write(tps=0 timeouts=0 errors=0) read(tps=438171 timeouts=0 errors=0) total(tps=438171 timeouts=0 errors=0)

So why increasing the amount of keys in benchmark leads to an order of magnitude better results?

P.S. I experience exactly the same behaviour for storage-engine=memory as well.

Albot · June 3, 2019, 7:55pm

I’m not as familiar with the Java benchmark, but it sounds like you’re comparing a performance test using 10 keys vs 1000000 keys. There are a few problems with using such a small number of keys;

While a record is being operated on, a lock is placed on the record. This means two updates can’t go to the same record at the same time. This limits the write throughput.
If reading the same records a lot of times, you are not spreading the load out at all.
This is likely causing hotkey contention: Hot Key error code 14

Is there a particular use case where you expect to only be using 10 records?

szhem · June 4, 2019, 1:03pm

Is there a particular use case where you expect to only be using 10 records?

Actually, there isn’t. But this is just a simple benchmark and I’d like to understand the difference, because client library does report no errors and there are no errors in the server logs as well.

kporter · June 4, 2019, 4:50pm

You typically won’t get the key busy error for reads. To get a key busy error you either need to be running in SC/linearize mode or for the cluster to be disrupted with either SC or read-duplicate-resolution enabled.

I believe @Albot’s explanation for the performance is accurate, with only 10 records, there are a lot of requests for a few locks. You could verify with perf.

pgupta · June 4, 2019, 10:31pm

Test performance with read-page-cache true for device configuration, for the 10 records case.

Topic		Replies	Views
Confusing benchmark results Aerospike Server Benchmarks	2	2787	June 22, 2015
Peformance Aeropsike Vs Redis	21	1990	June 24, 2019
Not able to achieve 1Million TPS in Aerospike Benchmarks despite of capable hardware Aerospike Server Benchmarks	19	9391	March 29, 2017
Aerospike batch requests performance tuning Tuning	13	1758	November 18, 2022
Benchmark always begins with spike Aerospike Server Benchmarks	1	1798	June 28, 2017

Understanding asbenchmark parameters (10x performance difference)

Related topics