CRITICAL (migrate): (migrate.c:migrate_tree_reduce:1872) malloc

Server version: 3.5.8

Hey guys,

We are currently running a cluster with 10 nodes on amazon ec2 instances i2.2xlarge. Recently we’ve been getting a malloc error on random nodes which are causing them to shut down. This is the error:

Apr 16 2016 00:38:22 GMT: CRITICAL (migrate): (migrate.c:migrate_tree_reduce:1872) malloc

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::93) SIGABRT received, aborting Aerospike Community Edition build 3.5.8

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: found 11 frames

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x59) [0x46d6c5]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 1: /lib64/libc.so.6(+0x35650) [0x7fb3b89b0650]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 2: /lib64/libc.so.6(gsignal+0x37) [0x7fb3b89b05d7]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 3: /lib64/libc.so.6(abort+0x148) [0x7fb3b89b1cc8]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x271) [0x4ff298]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 5: /usr/bin/asd(migrate_tree_reduce+0x4f0) [0x4d9581]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 6: /usr/bin/asd() [0x459654]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 7: /usr/bin/asd(as_migrate_tree+0x9d) [0x4d9620]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 8: /usr/bin/asd(migrate_xmit_fn+0x903) [0x4da646]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 9: /lib64/libpthread.so.0(+0x7df5) [0x7fb3b987edf5]

Apr 16 2016 00:38:22 GMT: WARNING (as): (signal.c::95) stacktrace: frame 10: /lib64/libc.so.6(clone+0x6d) [0x7fb3b8a71bfd]

It was working very well in the past for a couple months, so we are not sure why this started happening. Do you guys have any insight on this issue?

Thanks, Phu

Appears malloc failed to allocate. This may indicate the node was out of memory. Migration up to 3.7.5 would load an entire partition into memory before shipping, see AER-4667 in release notes. There have also been a handful of other memory related issues fixed since your version.

This may be related to Cannot allocate memory but there is still memory available.

Hey kporter,

Thanks for your reply!

I’m monitoring the memory usage but I don’t see it ever becoming close to 55G, which is the disk-space we specified for our namespace. Usually it’s about 27~30GB, and it would suddenly get that error. Should I see the memory go up as the migration is happening?

Thanks, Phu

The “CRITICAL (migrate): (migrate.c:migrate_tree_reduce:1872) malloc” message only happens when the malloc failed. Low memory is a likely culprit but not the only possibility. It could also be caused by memory fragmentation and corruption.

Can you describe your memory monitoring?

Hey kporter,

I monitor using the free -g command. Over the weekend this issue hasn’t happened and all the migrations are almost completed. Do you think this could be some network issue?

Thanks, Phu

Wouldn’t be able to determine that from this output. I would suspect memory fragmentation since these are rather large allocs here – the size of this alloc was partially why we moved away from this model in 3.7.5.