CPU - Top - PS discrepency

Hello,

I am benchmarking Aerospike and I have notices some weirdness with the reporting. I am using snmp to gather metrics and I see the CPU at around 4% (96% idle), which is way lower than I would expect considering that I have a large client simulating load on another machine.

05:59:50 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:59:50 PM  all    0.48    0.00    0.55    0.01    0.00    0.67    2.03    0.00    0.00   96.26

When I look at top I see the weirdness…

The CPU matches with the 3-4% almost totally Idle, but the processes list shows:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2326 root      20   0 6965640 2.001g   2940 S 500.0  6.9 155:12.33 asd

I see the process is using 500.0% which for this 16 core machine would be about 34% total.

As it is a virtual machine, I see the total CPU usage of the VM is about 38%, which I would expect. Any ideas why Aerospike is misreporting it’s usage so wildly?

For another veiw,

mpstat shows less than 4% CPU utilization

mpstat
06:17:44 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:17:44 PM  all    0.50    0.00    0.57    0.01    0.00    0.73    1.91    0.00    0.00   96.29

540% cpu with ps …

ps aux | grep aero
root       2326  540  6.9 6973832 2106764 ?     Ssl  17:32 238:00 /usr/bin/asd --config-file /etc/aerospike/aerospike.conf --fgdaemon

Any ideas?

Is there a way to get the snmp data to show the actual usage of Aerospike?

What do you mean by ‘Aerospike is misreporting’? Do you think the Unix tools ‘ps’ and ‘top’ work by the processes pushing out stats voluntarily?

I am not saying Aerospike is pushing stats voluntarily. But I am suggesting that it is doing something in a way that not reflecting in the standard monitoring tools.

What I am guessing here is that either Aerospike is “using the CPU” in a way that doesn’t show up in the top command. I would like to know why this is happening, and if there is a way to effectively monitor the application…

I haven’t seen this big of a discrepancy in any other applications.

I see the same behavior on multiple OS’s, and with various VM sizes.

You seem to be implying that Aerospike is circumventing the Linux scheduler and taking extra CPU time. This would be a really neat trick!

The CPU% discrepancy seen with top versus ps seems to be a common question in google.

Briefly, top (and likely mpstat) take an average CPU utilization over some time interval while ps takes a single measurement.