The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
FAQ - Why is a single core running at 100% intermittently?
Detail
When running an Aerospike server, operating system optics show that a single CPU is periodically close to or at 100% utilization. What is the reason for this?
An example of how the CPU utilization will look by running mpsstat -P ALL 2 3
07:14:21 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
07:14:23 PM all 4.26 0.00 0.94 3.95 0.00 0.44 0.00 0.00 0.00 90.40
07:14:23 PM 0 3.06 0.00 2.04 0.00 0.00 1.02 0.00 0.00 0.00 93.88
07:14:23 PM 1 2.06 0.00 1.55 0.00 0.00 1.55 0.00 0.00 0.00 94.85
07:14:23 PM 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
07:14:23 PM 3 2.00 0.00 1.50 0.00 0.00 2.50 0.00 0.00 0.00 94.00
07:14:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
07:14:25 PM all 3.83 0.00 0.71 4.14 0.00 0.33 0.00 0.00 0.00 90.99
07:14:25 PM 0 1.01 0.00 2.02 0.51 0.00 1.52 0.00 0.00 0.00 94.95
07:14:25 PM 1 0.51 0.00 1.03 0.00 0.00 1.54 0.00 0.00 0.00 96.92
07:14:25 PM 2 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
07:14:25 PM 3 1.52 0.00 1.01 0.00 0.00 1.01 0.00 0.00 0.00 96.46
Answer
This is due to the Aerospike Namespace Supervisor process, nsup
. The Namespace Supervisor is responsible for operations such as eviction and expiration. By default nsup
is single threaded. As of Aerospike 4.5.1.5, this can be changed through the nsup-threads
configuration parameter. Aerospike 4.5.1.5 also introduced a more efficient algorithm to expire and evict records which does not rely on the fabric channel and directly expires or evicts records as each partition is reduced by an nsup thread
.
An nsup
cycle can be time consuming as it has to cycle through all records in a given namespace. The frequency with which nsup
runs is controlled using the nsup-period
which defines the time period between nsup waking up from one run to the next. If the time taken for an nsup
cycle is greater than the nsup-period
then, in effect, nsup
will be running continously.
The behaviour of nsup
can be observed using the following log lines:
{ns-name} nsup-done: non-expirable 42162 expired (576066,922) evicted (24000935,259985) evict-ttl 134000 total-ms 120583
In the example above, the time taken for the nsup
cycle concerned is 120583ms which exceeds the default nsup-period
of 120s, and so here nsup
would appear to be running all the time. This is not a problem and is a normal part of Aerospike operation. In versions later than Aerospike 4.5.1.5 it is more obvious what is happening due to the lack of context switching and the usage of a particular CPU all the time.
To validate that the 100% CPU usage is due to nsup
running, nsup
can be disabled on a temporary basis by setting nsup-period
to 0. This can be done dynamically.
asinfo -v "set-config:context=namespace;id=namespaceName;nsup-period=0"
ok
Once nsup
has been shown to be the reason for the CPU showing 100% it should be re-enabled (by setting a non-zero nsup-period
). If nsup
is not re-enabled records cannot be expired (or expired). If records are not expected to expire, nsup
can be permanently disabled as such.
Notes
-
Some anecdotal differences have been observed between the previous versions (prior to 4.5.1) and the new ones (4.5.1 and above). Specifically it seems that older versions may be more likely to switch between CPU cores across
nsup
runs compared to the new versions and -
Should it be required,
nsup-threads
can be increased to decrease the time taken for annsup
cycle and spread the load across multiple cores. While doing all these changes it is recommended to keep a watch on the normal write/read latencies so that the change you make does not affect them.
Keywords
NSUP 100% CPU CORE UTILISED