ASD restarting constantly


#1

Title

ASD restarting constantly with error message “Oct 04 2016 17:54:51 GMT: CRITICAL (xdr): (xdr_serverside.c:997) Out of memory.”

Problem Description

On servers version prior to version 3.9.0, you may land into a situation where ASD will contantly be re-booting itself with a segmentation fault.

Oct 04 2016 17:54:51 GMT: CRITICAL (xdr): (xdr_serverside.c:997) Out of memory.
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Enterprise Edition build 3.8.3 os ubuntu14.04
Oct 04 2016 17:54:51 GMT: WARNING (xdr): (xdr.c:4973) Got Signal 'Aborted'.
Oct 04 2016 17:54:51 GMT: INFO (xdr): (xdr.c:4976) Digestinfo logged to disk=42697, unlogged=0
Oct 04 2016 17:54:51 GMT: INFO (xdr): (xdr.c:4994) Flushed all log records to disk
Oct 04 2016 17:54:51 GMT: INFO (cf:rbuffer): (xdr.c:5005) Stats [13400:0:1:135:1]
Oct 04 2016 17:54:51 GMT: INFO (cf:rbuffer): (xdr.c:5005) Current ver [1] sptr [7089318:0] rptr [7089451:100] | wptr [7089744:97] | rctx [1:7089451:100] | wctx [1:7089744:97] | maxseg [13415065]
Oct 04 2016 17:54:51 GMT: INFO (xdr): (xdr.c:5007) XDR is stopped.
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: found 12 frames
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x35) [0x4bca08]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x354a0) [0x7feca565e4a0]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38) [0x7feca565e418]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7feca566001a]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x321) [0x5a30cd]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 5: /usr/bin/asd(xdr_read_record+0x469) [0x47cd56]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 6: /usr/bin/asd(xdr_handle_txn+0x7d) [0x55ed06]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 7: /usr/bin/asd(as_xdr_handle_txn+0x10) [0x47cf71]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 8: /usr/bin/asd(process_transaction+0x30c) [0x510976]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 9: /usr/bin/asd(thr_tsvc+0x51) [0x511146]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa) [0x7feca68c06fa]
Oct 04 2016 17:54:51 GMT: WARNING (as): (signal.c:100) stacktrace: frame 11: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7feca572fb5d]

Explanation

The error message “XDR: Out of memory” is actually a misleading log line. The root-cause is actually a list/map/blob object that is causing the issue. We improved on the serialization-deserialization in later releases which helped fix this issue as well.

[AER-5079] - (XDR) Improve latency caused by shipping list/map with high nesting/elements.

Solution

Solution is to either upgrade to server version 3.9.0 and above. Or, consult if a developer build off the 3.8+ branch is available (we did have a un-released 3.8.4-2 with the fix that fixed the issue for some customers).

Keywords

XDR memory crash sigabrt

Timestamp

10/04/2016