CPU - Top - PS discrepency


#1

Hello,

I am benchmarking Aerospike and I have notices some weirdness with the reporting. I am using snmp to gather metrics and I see the CPU at around 4% (96% idle), which is way lower than I would expect considering that I have a large client simulating load on another machine.

05:59:50 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
05:59:50 PM  all    0.48    0.00    0.55    0.01    0.00    0.67    2.03    0.00    0.00   96.26

When I look at top I see the weirdness…

The CPU matches with the 3-4% almost totally Idle, but the processes list shows:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
2326 root      20   0 6965640 2.001g   2940 S 500.0  6.9 155:12.33 asd

I see the process is using 500.0% which for this 16 core machine would be about 34% total.

As it is a virtual machine, I see the total CPU usage of the VM is about 38%, which I would expect. Any ideas why Aerospike is misreporting it’s usage so wildly?

For another veiw,

mpstat shows less than 4% CPU utilization

mpstat
06:17:44 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:17:44 PM  all    0.50    0.00    0.57    0.01    0.00    0.73    1.91    0.00    0.00   96.29

540% cpu with ps …

ps aux | grep aero
root       2326  540  6.9 6973832 2106764 ?     Ssl  17:32 238:00 /usr/bin/asd --config-file /etc/aerospike/aerospike.conf --fgdaemon

Any ideas?

Is there a way to get the snmp data to show the actual usage of Aerospike?


#2

What do you mean by ‘Aerospike is misreporting’? Do you think the Unix tools ‘ps’ and ‘top’ work by the processes pushing out stats voluntarily?


#3

I am not saying Aerospike is pushing stats voluntarily. But I am suggesting that it is doing something in a way that not reflecting in the standard monitoring tools.

What I am guessing here is that either Aerospike is “using the CPU” in a way that doesn’t show up in the top command. I would like to know why this is happening, and if there is a way to effectively monitor the application…

I haven’t seen this big of a discrepancy in any other applications.

I see the same behavior on multiple OS’s, and with various VM sizes.


#4

You seem to be implying that Aerospike is circumventing the Linux scheduler and taking extra CPU time. This would be a really neat trick!

The CPU% discrepancy seen with top versus ps seems to be a common question in google.

Briefly, top (and likely mpstat) take an average CPU utilization over some time interval while ps takes a single measurement.