3.6.1 crash

blonkel · September 27, 2015, 12:13am

Hey,

We upgraded today to 3.6.1 (enterprise) and it seems its pretty unstable.

Our cluster crashed with:

Sep 26 2015 21:48:35 GMT: CRITICAL (hb): (hb.c:as_hb_start_receiving:1338) unable to add socket 72 to epoll fd list: File exists


Sep 26 2015 21:48:36 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Enterprise Edition build 3.6.1 os debian7
Sep 26 2015 21:48:38 GMT: WARNING (as): (signal.c::96) stacktrace: found 8 frames
Sep 26 2015 21:48:39 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48ee59]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x321e0) [0x7f7af99e31e0]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f7af99e3165]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f7af99e63e0]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x22a) [0x51c2d3]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(as_hb_thr+0xec8) [0x4e9708]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f7afa7fdb50]
Sep 26 2015 21:48:40 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7af9a8c95d]

Any idears how to solve it? We downgraded to 3.6.0

psi · September 28, 2015, 9:03pm

Hello. Thanks for the report. Could you please send us the full log so we can see the configuration as well as the logs that happened prior to the crash?

Also, did you receive this same type of stack on multiple nodes?

psi · October 2, 2015, 2:02am

We have tried to reproduce the problem on 2- and 3-node Debian 7 and CentOS 6 mesh and multicast clusters, but the reported crash has not occurred. If you are still seeing this issue, could you please give us more info. so we can help resolve it. Otherwise, please close this issue. Thanks.

blonkel · October 2, 2015, 11:28am

Hey,

I send you a private message with a more detailed log file, thats all we got. We didnt tried to upgrade to 3.6.1 anymore. Yeah we noticed this crash on 3 cluster nodes.

We had 2 servers running. Heres a short schema:

server1 
ID1 ip:3000 v 3.6.0 
ID2 ip:4000 v 3.6.1

server2 
ID3 ip:3000 v 3.6.0 
ID4 ip:4000 v 3.6.1

During this crash ID1,2 and 4 crashed. If theres a need for this information, all instances are dockert.

Greetings Sascha

psi · October 3, 2015, 12:41am

Hello. Thanks for the info. We did not receive the private message containing the log file. Exactly who / which address did you send it to?

The fact that you are using Docker is an important clue. Are you using host networking? Does everything always work when using 3.6.0 for all 4 cluster nodes?

While I haven’t reproduced the crash using Docker yet, we can probably make progress on this issue if you can keep giving us more info. Thanks for your help!!

blonkel · October 4, 2015, 12:57pm

Hello,

I send you it again (private message here in this forum)

psi · October 6, 2015, 12:50am

Got it this time ~~ Thanks! Looking into the cause. Will let you know what I find.

psi · October 22, 2015, 12:07am

Hello. Are you using Amazon AWS? Whether or not, could you please give the kernel version and Linux distro. you are using? Specifically, could you please give the output of “uname -a”? Thanks!

blonkel · October 22, 2015, 3:54pm

Hello psi,

Thanks for your further investigation. Were running on own custom dedicated servers hosted at OVH.

Were running on debian 7 stable (including latest updates).

About the kernel stuff:

Were running currently a custom kernel (4.0.0).

The config is copy & paste of debian 7 stable kernel, if you like / need i can upload it for you.

PS: Anyways we updated now to 3.6.3 and it seems stable so far ~

Greetings Sascha

Topic		Replies	Views
Aerospike 3.7.0.2 crashing crash	5	2265	December 21, 2015
Aerospike crashes (3.5.15) - resolved by upgrading to 3.6.1 [Resolved]	5	2031	September 30, 2015
Aerospike crash after re-joining a disabled node (3.6.1) Operations	9	2390	October 15, 2015
Aerospike crash after re-joining a disabled node ( aerospike-server-community-3.7.2-el6) Operations	1	1132	February 4, 2016
SIGSEGV received Crash : 5.7.0.10 CE on os e17 Operations crash	12	942	November 22, 2022

3.6.1 crash

Related topics