Receiving SIGFPE and Aerospike keeps restarting


#1

Hi, We are currently running Aerospike Server version 3.3.21 with 6 server nodes. Recently we’ve been encountering an issue where one of the nodes will go down and keep trying to restart. Manually restarting using “sudo service aerospike restart” fixes the issue, but another box will go down with the same issue later on. Here is the log that gets printed:

Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_fpe+0x59) [0x46764e]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 1: /lib64/libc.so.6(+0x33c60) [0x7fee98f25c60]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 2: /usr/bin/asd() [0x50accb]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 3: /usr/bin/asd() [0x50b323]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 4: /usr/bin/asd(udf_apply_record+0x120) [0x4b925d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 5: /usr/bin/asd(udf_rw_local+0x180) [0x4ba76d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 6: /usr/bin/asd() [0x4a5d55]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 7: /usr/bin/asd(as_rw_start+0x24f) [0x4a71d5]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 8: /usr/bin/asd(process_transaction+0xd69) [0x4b17f3]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 9: /usr/bin/asd(thr_tsvc+0x1b) [0x4b2068]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 10: /lib64/libpthread.so.0(+0x7f18) [0x7fee99bbef18]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 11: /lib64/libc.so.6(clone+0x6d) [0x7fee98fd4b9d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::94) SIGFPE received, aborting Aerospike Community Edition build 3.3.21
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_fpe+0x59) [0x46764e]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 1: /lib64/libc.so.6(+0x33c60) [0x7fee98f25c60]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 2: /usr/bin/asd() [0x50accb]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 3: /usr/bin/asd() [0x50b323]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 4: /usr/bin/asd(udf_apply_record+0x120) [0x4b925d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 5: /usr/bin/asd(udf_rw_local+0x180) [0x4ba76d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 6: /usr/bin/asd() [0x4a5d55]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 7: /usr/bin/asd(as_rw_start+0x24f) [0x4a71d5]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 8: /usr/bin/asd(process_transaction+0xd69) [0x4b17f3]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 9: /usr/bin/asd(thr_tsvc+0x1b) [0x4b2068]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 10: /lib64/libpthread.so.0(+0x7f18) [0x7fee99bbef18]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::101) stacktrace: frame 11: /lib64/libc.so.6(clone+0x6d) [0x7fee98fd4b9d]
Apr 20 2015 19:12:35 GMT: WARNING (as): (signal.c::94) SIGFPE received, aborting Aerospike Community Edition build 3.3.21

Do you guys have any idea on what might be causing this issue?

Thanks, Phu


#2

Hi Phu,

Sorry that your experiencing this issue. But it appears it’s an issue that has been fixed in Aerospike 3.5.4. Were a potential crash can be caused by divide-by-zero in expiration thread stat calculation.

Aerospike 3.5.4: http://www.aerospike.com/download/server/notes.html#3.5.4

As always it’s always recommended to test a new version on your test production before deploying to your production nodes.

Jerry


#3

Hi Jerry,

Thanks for your help! I will work on upgrading our aerospike servers.

Phu