SIGSEGV received, aborting Aerospike Community Edition build 3.5.14 (AER-3780)

We have quite big cluster (+10 nodes) which contains +100 000 000 objects in. When we try to make “full scan” using Stream UDF function written in lua the entire cluster goes immediately down (or single node if we use queryNode(…) instead of queryAggregate(…))

All nodes go down with following SIGSEGV:

Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::160) SIGSEGV received, aborting Aerospike Community Edition build 3.5.14
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_segv+0x54) [0x46e947]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x36150) [0x7fee8f2f5150]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 2: /lib/x86_64-linux-gnu/libpthread.so.0(pthread_mutex_lock+0x4) [0x7fee901bce84]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: found 16 frames
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 3: /usr/bin/asd(as_index_get_vlock+0x16) [0x45a1e0]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 4: /usr/bin/asd(as_record_get+0xcd) [0x46992c]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 5: /usr/bin/asd(udf_record_open+0xd0) [0x4c18fe]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 6: /usr/bin/asd(as_aggr_istream_read+0x1a3) [0x44d1cd]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 7: /usr/bin/asd() [0x51dfb4]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 8: /usr/bin/asd() [0x53f2f8]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 9: /usr/bin/asd(lua_pcall+0x30) [0x52efc0]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 10: /usr/bin/asd() [0x5185ab]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 11: /usr/bin/asd() [0x518d7c]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 12: /usr/bin/asd(as_aggr__process+0x279) [0x44cba7]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 13: /usr/bin/asd(tscan_partition_thr+0x36c) [0x4b7764]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 14: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fee901bae9a]
Jul 15 2015 20:29:57 GMT: WARNING (as): (signal.c::162) stacktrace: frame 15: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fee8f3b28bd]

Would it be possible to get the OS version and kernel being used to run these nodes? Also any possibility of getting your lua code snippet?

Hi Luk-

While I don’t want to ask you to share this log, I would like to ask you to review it. Would you have a look at /var/log/messages? Review the section that corresponds to the timestamps of the failures (bearing in-mind the differences between local and GMT times.) Do you see anything regarding OOM (out of memory) errors?

I hope this helps. Let us know what you find,

-DM

Hi Luk,

This is a known bug with scan aggregation that was recently introduced. We are actively trying to put out a patch as soon as possible.

@luk,

A JIRA has been filed to follow up on this; it’s AER-3780, just for your reference. Please stay tuned for updates on our progress.

Cheers,

Maud

@luk,

We just released Aerospike Server Community Edition 3.5.15. It’s available for download here.

Among other things, this release fixes AER-3780, the regression of scan aggregations introduced in v3.5.8.

You can view the full release notes of v.3.5.15 here.