Aerospike sudden crash

crash

#1

I am running a 5 nodes cluster with version 3.7.0.2 and after some hours of usage all 5 instances are crashed. I have seen some other reports of crash in this version. Should I download the 3.7.1 version? Will it fix the crash?


#2

Here is one of the nodes log:

Jan 07 2016 11:27:30 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9069051 records, 0 subrecords, /data/aerospike.dat 18%
Jan 07 2016 11:27:32 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9213668 records, 0 subrecords, /data/aerospike.dat 19%
Jan 07 2016 11:27:34 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9360446 records, 0 subrecords, /data/aerospike.dat 19%
Jan 07 2016 11:27:36 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9506677 records, 0 subrecords, /data/aerospike.dat 19%
Jan 07 2016 11:27:38 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9650076 records, 0 subrecords, /data/aerospike.dat 19%
Jan 07 2016 11:27:40 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9794627 records, 0 subrecords, /data/aerospike.dat 20%
Jan 07 2016 11:27:42 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 9934363 records, 0 subrecords, /data/aerospike.dat 20%
Jan 07 2016 11:27:44 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10079869 records, 0 subrecords, /data/aerospike.dat 20%
Jan 07 2016 11:27:46 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10222682 records, 0 subrecords, /data/aerospike.dat 21%
Jan 07 2016 11:27:48 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10366149 records, 0 subrecords, /data/aerospike.dat 21%
Jan 07 2016 11:27:50 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10511360 records, 0 subrecords, /data/aerospike.dat 21%
Jan 07 2016 11:27:52 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10656491 records, 0 subrecords, /data/aerospike.dat 22%
Jan 07 2016 11:27:54 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10800173 records, 0 subrecords, /data/aerospike.dat 22%
Jan 07 2016 11:27:56 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 10940637 records, 0 subrecords, /data/aerospike.dat 22%
Jan 07 2016 11:27:58 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11080928 records, 0 subrecords, /data/aerospike.dat 22%
Jan 07 2016 11:28:00 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11225168 records, 0 subrecords, /data/aerospike.dat 23%
Jan 07 2016 11:28:02 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11371641 records, 0 subrecords, /data/aerospike.dat 23%
Jan 07 2016 11:28:04 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11515486 records, 0 subrecords, /data/aerospike.dat 23%
Jan 07 2016 11:28:06 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11657907 records, 0 subrecords, /data/aerospike.dat 24%
Jan 07 2016 11:28:08 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11801732 records, 0 subrecords, /data/aerospike.dat 24%
Jan 07 2016 11:28:10 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 11947059 records, 0 subrecords, /data/aerospike.dat 24%
Jan 07 2016 11:28:12 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12084446 records, 0 subrecords, /data/aerospike.dat 25%
Jan 07 2016 11:28:14 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12220179 records, 0 subrecords, /data/aerospike.dat 25%
Jan 07 2016 11:28:16 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12360025 records, 0 subrecords, /data/aerospike.dat 25%
Jan 07 2016 11:28:18 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12497573 records, 0 subrecords, /data/aerospike.dat 25%
Jan 07 2016 11:28:20 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12632931 records, 0 subrecords, /data/aerospike.dat 26%
Jan 07 2016 11:28:22 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12771527 records, 0 subrecords, /data/aerospike.dat 26%
Jan 07 2016 11:28:24 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 12918148 records, 0 subrecords, /data/aerospike.dat 26%
Jan 07 2016 11:28:26 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 13063614 records, 0 subrecords, /data/aerospike.dat 27%
Jan 07 2016 11:28:28 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 13201460 records, 0 subrecords, /data/aerospike.dat 27%
Jan 07 2016 11:28:30 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 13337450 records, 0 subrecords, /data/aerospike.dat 27%
Jan 07 2016 11:28:32 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 13481676 records, 0 subrecords, /data/aerospike.dat 27%
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]@bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142 
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084)  system memory: free 6590544kb ( 86 percent free ) 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090)  ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099)  rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104)  replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116)    heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129)    tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203)    partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><>  Aerospike Community Edition build 3.7.1  <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	# Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	}
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		#address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	}
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)                 port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	}
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	}
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 		port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	}
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 	storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)           filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)         }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) 
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:49 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 770551 records, 0 subrecords, /data/aerospike.dat 1%

#3

I have upgraded some nodes to version 3.7.1 and got the same error. Is there any problem to run 3.7.0.2 together with 3.7.1?


#4

Hi Daniel, I checked this and we don’t have a record of this stack trace. Please can you give me os details and I will raise it internally?


#5

There should be no problem running both 3.7.0.2 together with 3.7.1. We have created AER-4705 for internal tracking on this issue.


#6

I’ve experienced this crash as well. I have a cluster of 4 nodes, and they all went down around the same time. I brought them back up, then about 30m later 1 node went down again. Here is one of the nodes log:

Feb 26 2016 22:09:11 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Feb 26 2016 22:09:14 GMT: WARNING (hb): (hb.c::516) as_hb_nodes_discovered_hash_del_conn node bb90f67561b460e fd:[62] not found
Feb 26 2016 22:09:14 GMT: WARNING (hb): (hb.c::516) as_hb_nodes_discovered_hash_del_conn node bb90f67561b460e fd:[119] not found
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d6c92c008 with fne: 0x7f9d5640e068 and fd: 202 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d6c96d008 with fne: 0x7f9d5640e068 and fd: 204 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d6c8eb008 with fne: 0x7f9d5640e068 and fd: 197 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d57038008 with fne: 0x7f9d5640e068 and fd: 72 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d6c869008 with fne: 0x7f9d5640e068 and fd: 195 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d6c8aa008 with fne: 0x7f9d5640e068 and fd: 196 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f9d56c2e008 with fne: 0x7f9d5640e068 and fd: 73 (Failed)
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 86: 9 (Bad file descriptor)
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os el6
Feb 26 2016 22:09:15 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: found 11 frames
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x62) [0x48c49b]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib64/libc.so.6(+0x35670) [0x7f9d8a93c670]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib64/libc.so.6(gsignal+0x37) [0x7f9d8a93c5f7]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib64/libc.so.6(abort+0x148) [0x7f9d8a93dce8]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2b0) [0x51ea60]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x4a3ab2]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction+0x10) [0x4da4ca]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd() [0x465053]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(as_thread_worker+0x93) [0x5448b4]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /lib64/libpthread.so.0(+0x7dc5) [0x7f9d8bb0fdc5]
Feb 26 2016 22:09:15 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /lib64/libc.so.6(clone+0x6d) [0x7f9d8a9fdbdd]

I’m using the Amazon linux version of aerospike with ami-43295126. The version is upgraded to Aerospike Community Edition build 3.7.1.

cat /etc/system-release
# Amazon Linux AMI release 2015.09
uname -r
# 4.1.7-15.23.amzn1.x86_64

#7

AER-4705 was addressed in release 3.7.2: http://www.aerospike.com/download/server/notes.html#3.7.2