Intermittent spikes in user mode cpu usage only on a single aerospike node

Hi, have recently noticed intermittent (usually equal time intervals) spikes in user mode cpu usage only on one node is the aerspike cluster [have 2 nodes with replication factor 2] Have skimmed through the logs - haven’t found anything out of the ordinary there Only one pattern observed is that this spike is usually after the defrag statement in logs - however the spike is not always present after the defrag log Memory used is hardly 5-6% of the total available This behaviour has started recently and we haven’t made any config changes Following is the graph of the user mode cpu usage

Is there some config change that would balance out this behaviour. Also the traffic trend during this entire period is not that spiky and average qps is well below 500

Which version of Aerospike are you running?

Can you share your configuration?

Logs should also be useful, especially lines mentioning “nsup” and possibly the “defrag” lines you mentioned.

We are using version 5.5.0.3

following is the config

service {
        paxos-single-replica-limit 1 
        proto-fd-max 100000
        migrate-max-num-incoming 5
        migrate-threads 1
        batch-max-buffers-per-queue 512
        proto-fd-idle-ms 70000
}


namespace mycache {
        replication-factor 2
        memory-size 29G
        default-ttl 30d
        allow-ttl-without-nsup true
        nsup-period 120
        high-water-disk-pct 80
        high-water-memory-pct 90
        stop-writes-pct 100

        storage-engine device {
                device /dev/nvme0n1p1 /dev/sdb
                write-block-size 1M
                defrag-lwm-pct 50
        }
}

Also below is a snippet of logs around the time where we are seeing a spike in the cpu usage

Nov 04 2021 07:02:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31893284624 free-wblocks 351871 write-q 0 write (8061207,7.2) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:168) NODE-ID bb9e0032d0a0142 CLUSTER-SIZE 2
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:250)    cluster-clock: skew-ms 0
**Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:273)    system: total-cpu-pct 43 user-cpu-pct 32 kernel-cpu-pct 11 free-mem-kbytes 30308972 free-mem-pct 92**
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:291)    process: cpu-pct 18 heap-kbytes (2564734,2587808,3373056) heap-efficiency-pct 76.0
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:302)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:323)    fds: proto (246,382023,381777) heartbeat (1,4,3) fabric (24,24,0)
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:331)    heartbeat-received: self 2 foreign 4569625
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:361)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (3974660,3777920)
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:424) {mycache} objects: all 1694509 master 859042 prole 835467 non-replica 0
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:486) {mycache} migrations: complete
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:513) {mycache} memory-usage: total-bytes 108448576 index-bytes 108448576 sindex-bytes 0 used-pct 0.35
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:581) {mycache} device-usage: used-bytes 31950711424 avail-pct 91 cache-read-pct 20.47
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:635) {mycache} client: tsvc (0,0) proxy (0,0,0) read (373369430,0,0,214062063,0) write (215195612,5173,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)
Nov 04 2021 07:02:51 GMT: INFO (info): (ticker.c:909) {mycache} dup-res: ask 0 respond (12,1885)
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:321) histogram dump: {mycache}-read (587431493 total) msec
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:331)  (00: 0587302328) (01: 0000077689) (02: 0000050459) (03: 0000001001)
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:340)  (04: 0000000015) (05: 0000000001)
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:321) histogram dump: {mycache}-write (215200785 total) msec
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:331)  (00: 0215184310) (01: 0000007997) (02: 0000004558) (03: 0000002342)
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:331)  (04: 0000001418) (05: 0000000047) (06: 0000000065) (07: 0000000025)
Nov 04 2021 07:02:51 GMT: INFO (info): (hist.c:340)  (08: 0000000023)

Following are the logs with “nsup”

Nov 04 2021 06:43:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:43:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419334177,40112) evicted (0,0) evict-ttl 0 total-ms 258
Nov 04 2021 06:45:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:45:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419372470,38293) evicted (0,0) evict-ttl 0 total-ms 254
Nov 04 2021 06:47:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:47:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419411210,38740) evicted (0,0) evict-ttl 0 total-ms 255
Nov 04 2021 06:49:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:49:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419450120,38910) evicted (0,0) evict-ttl 0 total-ms 258
Nov 04 2021 06:51:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:51:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419508440,58320) evicted (0,0) evict-ttl 0 total-ms 264
Nov 04 2021 06:53:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:53:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419568202,59762) evicted (0,0) evict-ttl 0 total-ms 262
Nov 04 2021 06:55:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:55:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419628696,60494) evicted (0,0) evict-ttl 0 total-ms 260
Nov 04 2021 06:57:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:57:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419687571,58875) evicted (0,0) evict-ttl 0 total-ms 264
Nov 04 2021 06:59:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 06:59:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419747046,59475) evicted (0,0) evict-ttl 0 total-ms 259
Nov 04 2021 07:01:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:01:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419802790,55744) evicted (0,0) evict-ttl 0 total-ms 259
Nov 04 2021 07:03:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:03:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419859961,57171) evicted (0,0) evict-ttl 0 total-ms 257
Nov 04 2021 07:05:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:05:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419917215,57254) evicted (0,0) evict-ttl 0 total-ms 258
Nov 04 2021 07:07:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:07:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (419974868,57653) evicted (0,0) evict-ttl 0 total-ms 258
Nov 04 2021 07:09:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:09:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420032097,57229) evicted (0,0) evict-ttl 0 total-ms 257
Nov 04 2021 07:11:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:11:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420090332,58235) evicted (0,0) evict-ttl 0 total-ms 257
Nov 04 2021 07:13:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:13:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420147245,56913) evicted (0,0) evict-ttl 0 total-ms 255
Nov 04 2021 07:15:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:15:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420203453,56208) evicted (0,0) evict-ttl 0 total-ms 254
Nov 04 2021 07:17:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:17:13 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420259090,55637) evicted (0,0) evict-ttl 0 total-ms 254
Nov 04 2021 07:19:13 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:19:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420315914,56824) evicted (0,0) evict-ttl 0 total-ms 255
Nov 04 2021 07:21:14 GMT: INFO (nsup): (nsup.c:402) {mycache} nsup-start: expire-threads 1
Nov 04 2021 07:21:14 GMT: INFO (nsup): (nsup.c:814) {mycache} nsup-done: non-expirable 0 expired (420372498,56584) evicted (0,0) evict-ttl 0 total-ms 257

following are the logs with defrag

Nov 04 2021 06:59:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 32329680384 free-wblocks 351448 write-q 0 write (8059446,7.7) defrag-q 0 defrag-read (7993807,0.0) defrag-write (194783,0.0) shadow-write-q 0
Nov 04 2021 06:59:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31360345824 free-wblocks 352390 write-q 0 write (8059631,9.2) defrag-q 0 defrag-read (7994927,56.0) defrag-write (194810,1.4) shadow-write-q 0
Nov 04 2021 06:59:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31523514864 free-wblocks 352231 write-q 0 write (8059790,7.9) defrag-q 0 defrag-read (7994927,0.0) defrag-write (194810,0.0) shadow-write-q 0
Nov 04 2021 07:00:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31683855024 free-wblocks 352076 write-q 0 write (8059945,7.8) defrag-q 0 defrag-read (7994927,0.0) defrag-write (194810,0.0) shadow-write-q 0
Nov 04 2021 07:00:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31848396864 free-wblocks 351916 write-q 0 write (8060105,8.0) defrag-q 0 defrag-read (7994927,0.0) defrag-write (194810,0.0) shadow-write-q 0
Nov 04 2021 07:00:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 32003344048 free-wblocks 351764 write-q 0 write (8060257,7.6) defrag-q 0 defrag-read (7994927,0.0) defrag-write (194810,0.0) shadow-write-q 0
Nov 04 2021 07:01:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 32163776896 free-wblocks 351607 write-q 0 write (8060414,7.8) defrag-q 0 defrag-read (7994927,0.0) defrag-write (194810,0.0) shadow-write-q 0
Nov 04 2021 07:01:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31261801728 free-wblocks 352486 write-q 0 write (8060592,8.9) defrag-q 0 defrag-read (7995980,52.7) defrag-write (194836,1.3) shadow-write-q 0
Nov 04 2021 07:01:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31425142112 free-wblocks 352327 write-q 0 write (8060751,7.9) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:02:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31588427824 free-wblocks 352168 write-q 0 write (8060910,7.9) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:02:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31745296656 free-wblocks 352015 write-q 0 write (8061063,7.7) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:02:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31893284624 free-wblocks 351871 write-q 0 write (8061207,7.2) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:03:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 32048819216 free-wblocks 351719 write-q 0 write (8061359,7.6) defrag-q 0 defrag-read (7995980,0.0) defrag-write (194836,0.0) shadow-write-q 0
Nov 04 2021 07:03:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31123792608 free-wblocks 352619 write-q 0 write (8061541,9.1) defrag-q 0 defrag-read (7997057,53.8) defrag-write (194864,1.4) shadow-write-q 0
Nov 04 2021 07:03:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31285244336 free-wblocks 352461 write-q 0 write (8061699,7.9) defrag-q 0 defrag-read (7997057,0.0) defrag-write (194864,0.0) shadow-write-q 0
Nov 04 2021 07:04:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31447276400 free-wblocks 352304 write-q 0 write (8061856,7.8) defrag-q 0 defrag-read (7997057,0.0) defrag-write (194864,0.0) shadow-write-q 0
Nov 04 2021 07:04:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31605622896 free-wblocks 352149 write-q 0 write (8062011,7.8) defrag-q 0 defrag-read (7997057,0.0) defrag-write (194864,0.0) shadow-write-q 0
Nov 04 2021 07:04:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31768407088 free-wblocks 351991 write-q 0 write (8062169,7.9) defrag-q 0 defrag-read (7997057,0.0) defrag-write (194864,0.0) shadow-write-q 0
Nov 04 2021 07:05:03 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31931393536 free-wblocks 351831 write-q 0 write (8062329,8.0) defrag-q 0 defrag-read (7997057,0.0) defrag-write (194864,0.0) shadow-write-q 0
Nov 04 2021 07:05:23 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31006466560 free-wblocks 352732 write-q 0 write (8062515,9.3) defrag-q 0 defrag-read (7998137,54.0) defrag-write (194892,1.4) shadow-write-q 0
Nov 04 2021 07:05:43 GMT: INFO (drv_ssd): (drv_ssd.c:1829) {mycache} /dev/nvme0n1p1: used-bytes 31169391552 free-wblocks 352574 write-q 0 write (8062673,7.9) defrag-q 0 defrag-read (7998137,0.0) defrag-write (194892,0.0) shadow-write-q 0

@romil_shah you have nsup_period equal for 2 minutes, therefore every 2 minutes you can watch spikes on ASD CPU usage. Try to increase nsup_period

I’m don’t think NSUP is a problem. The spikes in the image appear every 30 min (assuming the x-axis is hours). Also, I don’t believe defrag is to blame either, it doesn’t appear to be handling much load.

The usual question would be “is anything else running on that machine?”, and “is this virtualized?”

Nothing else running on that machine. Have set this is up on an N2 8cpu box on GCP.

We also upgraded the aerospike version to 5.7.0.8 (but no change)

Also the GCP cpu graphs for the box tell a different story - (minute wise graph)

and for the same duration the Aerospike graph

Also the aerospike logs do show bloated values during the spikes (as shared above in the thread) - so the graph being plotted is not incorrect

Let me know if i am missing something here

Looks like GCP may be telling a similar story (notice the small peaks that happen at roughly the same intervals. The GCP graph’s points are averaged over 60 seconds while the utilization in the logs is averaged over 10-second intervals.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.