What affects performance?

benchmark

#1

I cloned the Aerospike server and c client onto a 64 core box with 256 GB RAM running Amazon Linux, built both as per the GitHub page instructions, ran the server, and ran the c client benchmark… and I’m seeing in the range of 120k to 150k operations per second. If I increase the number of benchmark threads past 4 then performance does not increase linearly.

I haven’t changed the default Aerospike server config. It appears to be in-RAM without persistence. The benchmark program is running over localhost (i.e. it should be fast). I have heard about performance for Aerospike up to 1 million operation per second. Which parameters can I tweak to increase the operations per second in this simple test? And what can I monitor to see whether Aerospike is performing optimally?


#2

That would not be a proper way to benchmark a database for performance. Aerospike cluster nodes typically run by themselves on the box, due to the fact that Aerospike is a multi-node, multicore, multithreaded system and will expand to use the available resources. It’s not deployed in production as memcached is, for example (a small local instance running on the same box as your app.)

You should set up and configure your cluster, then spin up multiple client instances. In Amazon EC2 you’d likely use r4 nodes for purely in-memory testing, and c4 nodes for the clients. You can use the C client’s benchmark tool, or the Java Client Benchmark application. You’ll be able to use various Linux tools to see if you’re bottlenecked on the server or the client. If it’s on the client side you can use more threads, or move to more client instances.

Read the recommendations for deploying on Amazon EC2. One thing you’ll run up against is networking limitations, detailed in the knowledge base article on using multiple NICs with Aerospike (and a specific discussion of ENIs vs. ENAs for EC2). Amazon ENIs had only one or two transmit/receive queues, which would become the TPS bottleneck on instances with a large number of cores. There’s also a limit in their network on packets/sec.

With regards to configuration tuning, releases 3.11 and 3.12 added several optimizations for in-memory operations. Make sure your configuration takes advantage of maximizing partition-tree-sprigs, and correctly sets partition-tree-locks. Depending on your Linux kernel version you may also be able to take advantage of auto-pin=cpu.

When you do benchmark, you’ll have to look at your latencies with asadm and asloglatency to check if you’re bottlenecked at the server or the client.

In general, you probably want to read up on how Aerospike has done benchmarks in the past, and read the Speed at Scale database manifesto, as it explains the philosphy of how to do those correctly, simulating real-world use cases and deployments.


#3

Thanks for the quick reply. FYI I’m not trying to replicate an Aerospike benchmark done in the past or trying to earnestly benchmark Aerospike under lab conditions etc. I just wanted to run the benchmark tool in a quick and dirty test to show that the Aerospike server and c client are correctly built and running, and to give a ‘taster’ regarding performance.

Also, if anything, I would expect running the server in in-RAM mode and c client running on the same box to have the highest possible performance because the localhost networking is generally faster with bigger buffers than using the Amazon NICs. And the Amazon NICs are generally rate limited. Also, there is zero overhead for Aerospike to be doing other cluster stuff like XDR etc with other Aerospike hosts because there are no other Aerospike hosts in a cluster, and therefore the cluster interaction should not be detracting from performance in a properly structured benchmark.

I would hope that performance increases linearly with Aerospike server has more boxes are added. But things like not-in-RAM mode, replication, XDR, UDF complexity, etc will get in the way of that linear scaling. Therefore, I’m naively expecting that a very simple test, on one box, without any of the detractors mentioned, in-RAM, and using localhost networking is going to be the fastest [1] and only get worse performance from there, or? Or what am I missing?

On the 64 core Amazon box and running the the benchmarks with 4 threads then each thread mysteriously uses about 50% CPU. I’m guessing there’s a bottleneck which is why it’s not using 100% CPU?

 54623 ec2-user  20   0  410m 4232 3720 R 53.0  0.0   1:02.30 benchmarks                                                                                                                                                                                                                                                            
 54626 ec2-user  20   0  410m 4232 3720 R 53.0  0.0   1:02.18 benchmarks                                                                                                                                                                                                                                                            
 54624 ec2-user  20   0  410m 4232 3720 R 51.0  0.0   1:02.35 benchmarks                                                                                                                                                                                                                                                            
 54625 ec2-user  20   0  410m 4232 3720 S 51.0  0.0   1:02.34 benchmarks                                                                                                                                                                                                                                                            
 54388 ec2-user  20   0 7899m 1.6g 5588 R 45.1  0.6   0:53.35 asd                                                                                                                                                                                                                                                                   
 54389 ec2-user  20   0 7899m 1.6g 5588 S 45.1  0.6   0:55.85 asd                                                                                                                                                                                                                                                                   
 54390 ec2-user  20   0 7899m 1.6g 5588 R 44.1  0.6   0:53.25 asd                                                                                                                                                                                                                                                                   
 54391 ec2-user  20   0 7899m 1.6g 5588 S 43.2  0.6   0:52.61 asd

Running the same benchmarks with 8 threads results in similar performance although twice as many threads are used. CPU for each benchmarks thread goes down a little. Personally I never really trust processes which are not running at 100% CPU because I find that often the sampling technique used to report the < 100% CPU rate can be misleading. So it’s definitely puzzling as to why the benchmarks threads are sleeping such a lot. Any ideas?

 54660 ec2-user  20   0  698m 4288 3772 R 51.3  0.0   1:37.36 benchmarks                                                                                                                                                                                                                                                            
 54655 ec2-user  20   0  698m 4288 3772 S 48.4  0.0   1:37.38 benchmarks                                                                                                                                                                                                                                                            
 54657 ec2-user  20   0  698m 4288 3772 S 48.4  0.0   1:37.76 benchmarks                                                                                                                                                                                                                                                            
 54659 ec2-user  20   0  698m 4288 3772 R 48.4  0.0   1:36.80 benchmarks                                                                                                                                                                                                                                                            
 54656 ec2-user  20   0  698m 4288 3772 S 47.4  0.0   1:37.18 benchmarks                                                                                                                                                                                                                                                            
 54658 ec2-user  20   0  698m 4288 3772 R 47.4  0.0   1:37.44 benchmarks                                                                                                                                                                                                                                                            
 54662 ec2-user  20   0  698m 4288 3772 R 46.4  0.0   1:37.33 benchmarks                                                                                                                                                                                                                                                            
 54661 ec2-user  20   0  698m 4288 3772 S 45.4  0.0   1:36.70 benchmarks                                                                                                                                                                                                                                                            
 54397 ec2-user  20   0 7921m 1.6g 5588 R 45.4  0.6   1:36.10 asd                                                                                                                                                                                                                                                                   
 54396 ec2-user  20   0 7921m 1.6g 5588 R 44.4  0.6   1:37.59 asd                                                                                                                                                                                                                                                                   
 54399 ec2-user  20   0 7921m 1.6g 5588 R 44.4  0.6   1:36.80 asd                                                                                                                                                                                                                                                                   
 54400 ec2-user  20   0 7921m 1.6g 5588 S 44.4  0.6   1:35.85 asd                                                                                                                                                                                                                                                                   
 54394 ec2-user  20   0 7921m 1.6g 5588 R 42.4  0.6   1:36.73 asd                                                                                                                                                                                                                                                                   
 54395 ec2-user  20   0 7921m 1.6g 5588 R 42.4  0.6   1:36.76 asd                                                                                                                                                                                                                                                                   
 54393 ec2-user  20   0 7921m 1.6g 5588 S 32.6  0.6   1:05.46 asd                                                                                                                                                                                                                                                                   
 54398 ec2-user  20   0 7921m 1.6g 5588 R 32.6  0.6   1:13.79 asd      

FYI I tried to build and install asadm but it looks like there is a bug according to its GitHub issue whereby it does not install on Amazon Linux.

I read the GitHub blurb for asloglatency but am not sure how it will be useful. It says it reports latencies by munging the output from aerospike.log but when I look in my aerospike.log file then I see no latency info / data for it to munge. Does this info need to be enabled somehow in the config? It does say that the asinfo tool can be used to manipulate the latency reporting but I haven’t been able to find asinfo in GitHub to build and install it. And if the open source is not available then I’m unclear about which binary is compatible with Amazon Linux (if any). Do you have any tips for installing asinfo on Amazon Linux?

[1] https://serverfault.com/questions/234223/how-fast-is-127-0-0-1


#4

Most if not all of the macro histograms that asloglatency parses are enabled by default. There are various other micro-benchmarks that need to be enabled via static/dynamic configuration.

They appear in the following form in the logs:

Taken from: http://www.aerospike.com/docs/reference/serverlogmessages

histogram dump: {ns-name}-<hist-name> (1344911766 total) msec
(00: 1262539302) (01: 0044998665) (02: 0013431778) (03: 0007273116)
(04: 0004299011) (05: 0003086466) (06: 0002182478) (07: 0001854797)
(08: 0000312272) (09: 0000370715) (10: 0000643337) (11: 0001045861)
(12: 0001991430) (13: 0000882538)

#5

You can login to asadm and run ‘show latency’, or just run ‘asloglatency -h reads’ and it should tail the local log.