Unable to max out CPU in KVM environment

benchmark

#1

I am currently doing performance benchmarking of Aerospike database in VM environment.

Following is the setup->

  • Storage-engine - In-memory
  • Cluster size - 2 Node Heartbeat type -
  • Multicast (tried Mesh as well)
  • of network interfaces - 2 * 1 Gbps (1 for client-server and another for cluster)

  • Other config details - service-threads and transaction-queues parameter matches # of vCPUs available.
  • Client - Aerospike benchmark utility. Tried with varying number of threads starting from 20 up to 700. Data size - String of length 100 and 500. (Tried up to 3 clients with 4 vCPUs each)
  • Operation - 100 % Write

I am running performing tests on Openstack public cloud. Following is our observation ->

No of vCPUs / TPS / Server Side CPU utilization (%)


  • 2 / 22-23k / 180
  • 4 / 28-29k / 245
  • 8 / 60-62k / 520

Neither the client vCPU nor the network were the bottleneck.

Also tried the similar setup in my local lab, where results were similar. In my local setup, I even tried with separating out CPU for network I/O and Aerospike using the smp_affinity setting and taskset, there was no change in the observation.

Questions -

  1. Apart from “service-threads” and “transaction-queues” is there any other config parameter that needs to be tweaked?
  2. What could be the other reasons for not able to max out the CPU for higher number of vCPUs ?

#2

In general, there are more factor’s that may effect VM environments, but I am surprised that you are observing the same behavior in local (non-VM) setup too.

When doing our benchmarks in non-VM environment, on a 8 CPU machine, we are able to use the CPU upto 700%. One important question here is…“What all you are accounting ?”. Do you also consider the system time ? Or the numbers quoted by you are (100-idle%) ? If you can share the output of top command from both the VM and non-VM environments, it will help.

3 4vCPU clients may not be able to saturate 2-node 8CPU server node. You can try to increase the clients and see if you can push more throughput. You can see that in a 2vCPU server nodes, the cpu util % is high. It drops as the cores on the server are increasing. It could be because of thread context switching too.

Can you try 100% read load too ? I see that you are doing 100% write load. Aerospike does sync replication over the network. May be the replication component is not catching up. We will know this if you do 100% read load. I am expecting to see higher CPU utilization in this workload.