What is %steal CPU statistic in mpstat

FAQ - What is %steal CPU statistic in mpstat

Detail

When dealing with mpstat, multiple lines of output are present. This short article describes them and focuses specifically on the %steal part of the output.

16:10:08     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
16:10:10     all    7.87    0.00   11.61    6.07    0.00    4.87    1.73    0.00    0.00   67.85
16:10:10       0    9.28    0.00   12.89    5.67    0.00    5.15    1.55    0.00    0.00   65.46
16:10:10       1    6.91    0.00   13.30    5.85    0.00    2.66    1.60    0.00    0.00   69.68
16:10:10       2    7.53    0.00   11.83    5.91    0.00    5.91    2.15    0.00    0.00   66.67
16:10:10       3    8.60    0.00   10.75    5.38    0.00    4.84    2.15    0.00    0.00   68.28
16:10:10       4    8.47    0.00   11.11    6.35    0.00    5.29    1.59    0.00    0.00   67.20
16:10:10       5    5.43    0.00   13.59    4.89    0.00    5.98    2.17    0.00    0.00   67.93
16:10:10       6    8.38    0.00    9.42    7.33    0.00    5.24    2.09    0.00    0.00   67.54
16:10:10       7    7.69    0.00    9.34    7.14    0.00    4.40    1.65    0.00    0.00   69.78

As seen in the output above, mpstat provides the following:

name detail
%usr the amount of time spent in userland, handling applications
%nice the amount of time spent in the ‘nice’ state, for applications with higher/lower ‘nice’ values, or priorities
%sys amount of time spent in dealing with system handling, this is the kernel and drivers
%iowait the amount of time the CPU has spent waiting for available IO bandwidth. High values indicate bottleneck in network or disk
%irq the amount of time spent waiting on hard hardware interrupts to finish
%soft how much time the CPU has spent waiting on soft interrupts. This will often relate to network drivers and will be directly connected to %sys increasing as well
%steal the amount of time stolen from the CPU by the hypervisor (this is discussed in detail below)
%guest this will only show if the machine itself is a hypervisor - the amount of time spent serving virtual machines
%gnice this shows the %guest usage when the guest virtual machine has %nice priority applied to it - niced guest

Answer

When running Aerospike in a virtualized platform, it is particularly important to monitor the %steal. This parameter shows the amount of time the physical CPU has “stolen” from the vCPU. In other words, this parameter shows how much time the vCPU core has spent waiting while the physical CPU core deals with another vCPU. In a 1:1 ratio, this number should be close to 0, if not 0. Overprovisioned vCPUs (i.e. those where the vCPU count exceeds the physical CPU count on the underlying hardware) will notice this number increase.

If you see this number increase, check with your cloud team (if hosting internally) or cloud vendor (if using a public cloud host).

Also note that a %steal of just 2% is significant. While it doesn’t exactly translate to this representation, it can be visualised as the vCPU core not doing anything for 1 second out of each 50 seconds. In this time, the physical CPU core is dealing with another vCPU request, from another virtual machine.

Notes

Keywords

mpstat steal %steal overprovision vcpu cloud

Timestamp

March 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.