High server CPU usage on google cloud


#1

I’m totally new to aerospike, but I’ve been playing around with it for a month or two on Google Compute Engine (GCE), using a single local SSD. Right now the server is just running on a single node with replication factor 1. Here’s my config:

service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 8
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 15000
}

The hardware is an ‘n1-highmem-4’ (4 cores, 26GB RAM).

Every time I examine CPU usage (via ‘top’, for instance), asd is pegged at 100%, more if I have an operation running. Is this expected? It’s just me running a few requests every now and then, so there should be basically zero load. I’ve restarted the server once or twice but I this doesn’t seem to help. Any ideas for what the issue might be or tips for what to investigate?


#2

You are right in that there should be almost zero load if there are no operations running.

  1. Can you run the “top” command and then press “1” - so that it shows the individual CPU details. It will be great if you attach the screenshot.
  2. Would like to see if there is anything unusual in the log (/var/log/aerospike/aerospike.log) Can you put a snippet here.
  3. Have you done any custom configurations (say “nsup-period”)? Can you share the complete config file.

#3

Hi there - looks like I can’t upload an image since I’m a “new user”, but here’s the text of ‘top’:

top - 21:12:06 up 36 days,  6:15,  5 users,  load average: 1.01, 1.03, 1.00
Tasks: 131 total,   1 running, 130 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.7%us,  0.3%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 97.7%us,  2.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  26676756k total, 23843508k used,  2833248k free,   180264k buffers
Swap:        0k total,        0k used,        0k free,  4398540k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6581 root      20   0 17.0g  14g 1172 S 101.4 55.8  36478:46 asd
 1388 root      20   0 90188 8528 2472 S  0.3  0.0   4:56.10 manage_addresse
13394 brendano  20   0 15036 1236  936 R  0.3  0.0   0:00.30 top
    1 root      20   0 19336 1416 1064 S  0.0  0.0   0:08.13 init

And here are the last 100 lines of /var/log/aerospike/aerospike.log …

Apr 13 2015 21:13:34 GMT: INFO (namespace): (namespace.c::444) {annotation} hwm_breached true (memory), stop_writes false, memory sz:14760036032 (14760036032 + 0) hwm:10307921920 sw:15461881856, disk sz:82695996544 hwm:201326592000
Apr 13 2015 21:13:35 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc: used 82695996544, contig-free 289305M (2314441 wblocks), swb-free 2, n-w 0, w-q 0 w-tot 3442870 (0.0/s), defrag-q 0 defrag-tot 2684002 (0.0/s)
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4565)  system memory: free 8126492kb ( 30 percent free )
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4573)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 230625563 ::: sub_objects 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4581)  rec refs 230649241 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4587)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4597)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (6, 639, 633) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4599)    heartbeat_received: self 20538694 : foreign 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4600)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4613)    tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4654) namespace annotation: disk inuse: 82695996544 memory inuse: 14760036032 (bytes) sindex memory inuse: 0 (bytes) avail pct 75 cache-read pct 0.00
Apr 13 2015 21:13:42 GMT: INFO (info): (thr_info.c::4674)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: reads (643 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::163)  (00: 0000000642) (01: 0000000001)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (793872313 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::154)  (00: 0791094034) (01: 0002114849) (02: 0000538178) (03: 0000083753)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::154)  (04: 0000039061) (05: 0000001434) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (796381248 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::154)  (00: 0793607775) (01: 0002110914) (02: 0000537374) (03: 0000083699)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::154)  (04: 0000039050) (05: 0000001432) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 13 2015 21:13:42 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4565)  system memory: free 8126376kb ( 30 percent free )
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4573)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 230625563 ::: sub_objects 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4581)  rec refs 230633145 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4587)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4597)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (6, 639, 633) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4599)    heartbeat_received: self 20538760 : foreign 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4600)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4613)    tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4654) namespace annotation: disk inuse: 82695996544 memory inuse: 14760036032 (bytes) sindex memory inuse: 0 (bytes) avail pct 75 cache-read pct 0.00
Apr 13 2015 21:13:52 GMT: INFO (info): (thr_info.c::4674)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: reads (643 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::163)  (00: 0000000642) (01: 0000000001)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (793872313 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::154)  (00: 0791094034) (01: 0002114849) (02: 0000538178) (03: 0000083753)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::154)  (04: 0000039061) (05: 0000001434) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (796381248 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::154)  (00: 0793607775) (01: 0002110914) (02: 0000537374) (03: 0000083699)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::154)  (04: 0000039050) (05: 0000001432) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 13 2015 21:13:52 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 13 2015 21:13:55 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc: used 82695996544, contig-free 289305M (2314441 wblocks), swb-free 2, n-w 0, w-q 0 w-tot 3442870 (0.0/s), defrag-q 0 defrag-tot 2684002 (0.0/s)
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4565)  system memory: free 8126940kb ( 30 percent free )
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4573)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 230625563 ::: sub_objects 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4581)  rec refs 230657044 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4587)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4597)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (6, 639, 633) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4599)    heartbeat_received: self 20538827 : foreign 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4600)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4613)    tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4654) namespace annotation: disk inuse: 82695996544 memory inuse: 14760036032 (bytes) sindex memory inuse: 0 (bytes) avail pct 75 cache-read pct 0.00
Apr 13 2015 21:14:02 GMT: INFO (info): (thr_info.c::4674)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: reads (643 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::163)  (00: 0000000642) (01: 0000000001)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (793872313 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::154)  (00: 0791094034) (01: 0002114849) (02: 0000538178) (03: 0000083753)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::154)  (04: 0000039061) (05: 0000001434) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (796381248 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::154)  (00: 0793607775) (01: 0002110914) (02: 0000537374) (03: 0000083699)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::154)  (04: 0000039050) (05: 0000001432) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 13 2015 21:14:02 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4565)  system memory: free 8126728kb ( 30 percent free )
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4573)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 230625563 ::: sub_objects 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4581)  rec refs 230664854 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4587)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4597)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (6, 639, 633) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4599)    heartbeat_received: self 20538893 : foreign 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4600)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4613)    tree_counts: nsup 1 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4654) namespace annotation: disk inuse: 82695996544 memory inuse: 14760036032 (bytes) sindex memory inuse: 0 (bytes) avail pct 75 cache-read pct 0.00
Apr 13 2015 21:14:12 GMT: INFO (info): (thr_info.c::4674)    partitions: actual 4096 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: reads (643 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::163)  (00: 0000000642) (01: 0000000001)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (793872313 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::154)  (00: 0791094034) (01: 0002114849) (02: 0000538178) (03: 0000083753)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::154)  (04: 0000039061) (05: 0000001434) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (796381248 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::154)  (00: 0793607775) (01: 0002110914) (02: 0000537374) (03: 0000083699)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::154)  (04: 0000039050) (05: 0000001432) (06: 0000000558) (07: 0000000232)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::163)  (08: 0000000193) (09: 0000000021)
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 13 2015 21:14:12 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 13 2015 21:14:15 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc: used 82695996544, contig-free 289305M (2314441 wblocks), swb-free 2, n-w 0, w-q 0 w-tot 3442870 (0.0/s), defrag-q 0 defrag-tot 2684002 (0.0/s)

Lastly, I haven’t done any custom configurations, but the initial setup was using GCE’s “click to deploy” feature - maybe there are some funky defaults in there?

Thanks for the help!


#4

It appears that the high-water-mark has been reached for you namespace, “annotation” - from the first line of your log extract.

Get the stats for your namespace using:

asinfo -h <host ip> -v 'namespace/<namespace name>'

and check the hwm-breached parameter.

Read more at: http://www.aerospike.com/docs/operations/manage/storage


#5

Great, thank you, I hadn’t noticed that. My objects are set to never expire, so when the HWM is passed, does asd kick off a process to try to evict objects, but there are none that can be evicted, so it just searches indefinitely without finding anything? Thanks again, this is really helpful - just trying to understand how things work here.