Understanding asbenchmark parameters (10x performance difference)

Hi,

Given:

  1. Aerospike Community Edition 4.5.1.5
  • one node
  • namespace configuration
    namespace test {
        replication-factor 1
        memory-size 24G
        default-ttl 30d
        storage-engine device {
            device /dev/sdd
            scheduler-mode noop
            write-block-size 128K
        }
    }
    
  1. asbenchmark 4.3.1, that has been
  • installed on the separate dedicated server

The strange issue is that when I’m running the benchmark with the following parameters

asbenchmark \
    --async \
    --bins 1 \
    --hosts 192.168.0.5 \
    --port 3000 \
    --namespace test \
    --set testset \
    --keys 10 \
    --keyType String \
    --keylength 24 \
    -netty \
    -nettyEpoll \
    -objectSpec B:1024 \
    --eventLoops 4 \
    --workload RR,100,0 \
    --connPoolsPerNode 100 \
    --asyncMaxCommands 1000 \
    --replica any

… I get the following output

2019-06-03 19:15:18.122 write(tps=0 timeouts=0 errors=0) read(tps=45007 timeouts=0 errors=0) total(tps=45007 timeouts=0 errors=0)
2019-06-03 19:15:19.122 write(tps=0 timeouts=0 errors=0) read(tps=45266 timeouts=0 errors=0) total(tps=45266 timeouts=0 errors=0)
2019-06-03 19:15:20.123 write(tps=0 timeouts=0 errors=0) read(tps=47231 timeouts=0 errors=0) total(tps=47231 timeouts=0 errors=0)
2019-06-03 19:15:21.123 write(tps=0 timeouts=0 errors=0) read(tps=45281 timeouts=0 errors=0) total(tps=45281 timeouts=0 errors=0)

If I change the number of keys to reasonable large value, e.g. 1000000, like the following

asbenchmark \
    --async \
    --bins 1 \
    --hosts 192.168.0.5 \
    --port 3000 \
    --namespace test \
    --set testset \
    --keys 1000000 \
    --keyType String \
    --keylength 24 \
    -netty \
    -nettyEpoll \
    -objectSpec B:1024 \
    --eventLoops 4 \
    --workload RR,100,0 \
    --connPoolsPerNode 100 \
    --asyncMaxCommands 1000 \
    --replica any

… I get the following output

2019-06-03 19:17:33.231 write(tps=0 timeouts=0 errors=0) read(tps=421654 timeouts=0 errors=0) total(tps=421654 timeouts=0 errors=0)
2019-06-03 19:17:34.231 write(tps=0 timeouts=0 errors=0) read(tps=445527 timeouts=0 errors=0) total(tps=445527 timeouts=0 errors=0)
2019-06-03 19:17:35.232 write(tps=0 timeouts=0 errors=0) read(tps=438792 timeouts=0 errors=0) total(tps=438792 timeouts=0 errors=0)
2019-06-03 19:17:36.232 write(tps=0 timeouts=0 errors=0) read(tps=438171 timeouts=0 errors=0) total(tps=438171 timeouts=0 errors=0)

So why increasing the amount of keys in benchmark leads to an order of magnitude better results?

P.S. I experience exactly the same behaviour for storage-engine=memory as well.

I’m not as familiar with the Java benchmark, but it sounds like you’re comparing a performance test using 10 keys vs 1000000 keys. There are a few problems with using such a small number of keys;

  • While a record is being operated on, a lock is placed on the record. This means two updates can’t go to the same record at the same time. This limits the write throughput.
  • If reading the same records a lot of times, you are not spreading the load out at all.
  • This is likely causing hotkey contention: Hot Key error code 14

Is there a particular use case where you expect to only be using 10 records?

1 Like

Is there a particular use case where you expect to only be using 10 records?

Actually, there isn’t. But this is just a simple benchmark and I’d like to understand the difference, because client library does report no errors and there are no errors in the server logs as well.

You typically won’t get the key busy error for reads. To get a key busy error you either need to be running in SC/linearize mode or for the cluster to be disrupted with either SC or read-duplicate-resolution enabled.

I believe @Albot’s explanation for the performance is accurate, with only 10 records, there are a lot of requests for a few locks. You could verify with perf.

Test performance with read-page-cache true for device configuration, for the 10 records case. :slight_smile: