Hi,
I noticed that one of my nodes was not showing up in asmonitor so I connected to it and found something odd. The root partition local disk was full and asd was down, however, very shortly after the root partition was back to normal. I would usually attribute this to log rotation but it didn’t look like the logs were particularly big or recently rotate. Is there anything that aerospike writes to the root disk (other than logs)? The only things running on that instance are asd, amc (without any clients attached) and diamond (a lightweight service that gathers system stats and sends them to graphite).
Here’s the aerospike log at the time:
Oct 08 2015 21:45:24 GMT: INFO (partition): (partition.c::2121) blocking until pending writes are done: {biddingdb:2609}, pstate 1, 1 to go 159 centisecs
Oct 08 2015 21:45:25 GMT: INFO (partition): (partition.c::2121) blocking until pending writes are done: {biddingdb:2609}, pstate 1, 1 to go 169 centisecs
Oct 08 2015 21:45:26 GMT: INFO (partition): (partition.c::2121) blocking until pending writes are done: {biddingdb:2609}, pstate 1, 1 to go 179 centisecs
Oct 08 2015 21:45:26 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /dev/xvdb: used 141523437184, contig-free 401483M (802966 wblocks), swb-free 2, n-w 0, w- q 0 w-tot 4950739 (11.6/s), defrag-q 0 defrag-tot 4546759 (11.4/s)
Oct 08 2015 21:45:27 GMT: INFO (partition): (partition.c::2121) blocking until pending writes are done: {biddingdb:2609}, pstate 1, 1 to go 189 centisecs
Oct 08 2015 21:45:27 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /dev/xvdc: used 141387127168, contig-free 402152M (804305 wblocks), swb-free 1, n-w 0, w- q 0 w-tot 4940578 (11.8/s), defrag-q 0 defrag-tot 4537311 (11.1/s)
Oct 08 2015 21:45:28 GMT: INFO (partition): (partition.c::2121) blocking until pending writes are done: {biddingdb:2609}, pstate 1, 1 to go 199 centisecs
Oct 08 2015 21:45:28 GMT: CRITICAL (partition): (partition.c:as_partition_migrate_tx:2127) in migrate_tx, pending writes never drained: {biddingdb:2609}
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::93) SIGABRT received, aborting Aerospike Community Edition build 3.5.15
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: found 9 frames
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x54) [0x46ea38]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x36c30) [0x7fa77dce3c30]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7fa77dce3bb9]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7fa77dce6fc8]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a5) [0x4fd0ce]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 5: /usr/bin/asd(as_partition_migrate_tx+0x54d) [0x4db0f1]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 6: /usr/bin/asd(migrate_xmit_fn+0x719) [0x4d7f7b]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 7: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fa77ebbc182]
Oct 08 2015 21:45:28 GMT: WARNING (as): (signal.c::95) stacktrace: frame 8: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa77dda7fbd]
And the logs dir which seems quite reasonable:
$ date
Thu Oct 8 22:30:30 UTC 2015
$ ls -ltrah /var/log/aerospike
total 362M
-rw-r--r-- 1 root root 1.4M Sep 25 06:25 aerospike.log-20150925.gz
-rw-r--r-- 1 root root 2.2M Sep 26 06:49 aerospike.log-20150926.gz
-rw-r--r-- 1 root root 2.1M Sep 27 06:45 aerospike.log-20150927.gz
-rw-r--r-- 1 root root 2.1M Sep 28 06:34 aerospike.log-20150928.gz
-rw-r--r-- 1 root root 2.1M Sep 29 06:30 aerospike.log-20150929.gz
-rw-r--r-- 1 root root 3.1M Sep 30 06:43 aerospike.log-20150930.gz
-rw-r--r-- 1 root root 4.7M Oct 1 06:42 aerospike.log-20151001.gz
-rw-r--r-- 1 root root 7.8M Oct 2 06:35 aerospike.log-20151002.gz
-rw-r--r-- 1 root root 8.5M Oct 3 06:39 aerospike.log-20151003.gz
-rw-r--r-- 1 root root 29M Oct 4 06:30 aerospike.log-20151004.gz
-rw-r--r-- 1 root root 142M Oct 5 06:41 aerospike.log-20151005.gz
-rw-r--r-- 1 root root 130M Oct 6 06:52 aerospike.log-20151006.gz
-rw-r--r-- 1 root root 2.7M Oct 7 06:43 aerospike.log-20151007.gz
-rw-r--r-- 1 root root 2.7M Oct 8 06:30 aerospike.log-20151008.gz
drwxr-xr-x 2 aerospike aerospike 4.0K Oct 8 06:30 .
-rw-r--r-- 1 root root 25M Oct 8 21:45 aerospike.log
drwxrwxr-x 15 root syslog 4.0K Oct 8 21:45 ..
Any thoughts about why aerospike went down?
Thanks!