Critical SIGABRT error

Hi again, our two node cluster just died (both nodes simultaneously!) with a SIGABRT. Both are running in Docker Container with Debian Squeeze as host. Aerospike Enterprise Edition build 3.5.12.

Here is the log:

Jun 19 2015 23:15:19 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/maps.dat: used 25088, contig-free 10233M (10233 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 5 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Jun 19 2015 23:15:19 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/user.dat: used 17408, contig-free 10238M (10238 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 1 (0.0/s)
Jun 19 2015 23:15:21 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/product.dat: used 105384866432, contig-free 279329M (279329 wblocks), swb-free 16, n-w 0, w-q 0 w-tot 153244 (0.5/s), defrag-q 0 defrag-tot 36681 (0.2/s)
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4796)  system memory: free 58408060kb ( 88 percent free ) 
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 2 ::: objects 117355580 ::: sub_objects 0
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4812)  rec refs 117355580 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (84, 8939, 8855) : hb (4, 4, 0) : fab (30, 30, 0)
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 213025 : foreign 213043
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 176980 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4885) namespace product: disk inuse: 105384866432 memory inuse: 7510751680 (bytes) sindex memory inuse: 0 (bytes) avail pct 69 cache-read pct 1.86
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4885) namespace maps: disk inuse: 25088 memory inuse: 2816 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4860) namespace user: disk inuse: 17408 memory inuse: 12470 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Jun 19 2015 23:15:22 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 6120 sync 6168 desync 0 zombie 0 wait 0 absent 0
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: reads (9313153 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (00: 0009080137) (01: 0000069890) (02: 0000068304) (03: 0000066911)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (04: 0000025265) (05: 0000001276) (06: 0000001019) (07: 0000000341)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::163)  (08: 0000000010)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (312554 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (00: 0000053848) (01: 0000043211) (02: 0000052796) (03: 0000079480)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (04: 0000075319) (05: 0000007479) (06: 0000000369) (07: 0000000047)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::163)  (08: 0000000005)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (4616575 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (00: 0004354682) (01: 0000045140) (02: 0000053622) (03: 0000079821)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::154)  (04: 0000075381) (05: 0000007495) (06: 0000000382) (07: 0000000047)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::163)  (08: 0000000005)
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jun 19 2015 23:15:22 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jun 19 2015 23:15:22 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal:336) accept: Software caused connection abort (errno 103)
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::93) SIGABRT received, aborting Aerospike Enterprise Edition build 3.5.12
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: found 8 frames
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x54) [0x489e03]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x321e0) [0x7f32257f81e0]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f32257f8165]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f32257fb3e0]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x229) [0x51da3b]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 5: /usr/bin/asd(thr_demarshal+0x5ef) [0x49f2ea]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 6: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f3226612b50]
Jun 19 2015 23:15:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 7: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f32258a195d]

Any idea what might have caused this? Let us know if you need any more information.

Looks like errno 103 some sort of socket error.

More info on ECONNABORTED error code 103:

Was there any network disturbance on the host machine?

Thanks lucien. We are unable to verify if there was a short network disturbance (nothing in any logs/monitoring). In any case, why should such an event cause the Aerospike Cluster to crash?

We’ll need to look deeper into the root cause. Could you provide your dockerfile and let us which kernel and OS docker is running on? Also is this running on multicast?

Sure! Is there any way to send you this information more privately?

Done. Please check your email!

Looks like you are using Debian Squeeze (Debian 6) as per your initial post.

docker.io sites recommend the following for running Docker on Debian:

Docker is supported on the following versions of Debian:

Debian 8.0 Jessie (64-bit) Debian 7.7 Wheezy (64-bit)

Docker requires Kernel 3.8+, while Wheezy ships with Kernel 3.2 (for more details on why 3.8 is required, see discussion on bug Docker bug#407.)

Would it be possible for you to upgrade the OS?