C client crash on close idle connections cb

My code doing async put. Running 16 independent processes.

  • Cluster: 8 nodes, 256GB, HDD.
  • Client version: 4.6.13.el7.x86_64. I tried both libuv and libev. See the same problem.

Below is the trace back. See this before?

(gdb) bt #0 ck_pr_inc_32 (target=0x191) at modules/common/target/Linux-x86_64/include/aerospike/ck/gcc/x86_64/ck_pr.h:359 #1 as_node_reserve (node=0x191) at src/include/aerospike/as_node.h:477

#2 as_event_close_idle_connections_cb (event_loop=0x25702a0, state=0x7ffab00008c0) at src/main/aerospike/as_event.c:1311

#3 0x0000000000418b53 in as_ev_wakeup (loop=, wakeup=, revents=) at src/main/aerospike/as_event_ev.c:81

#4 0x000000000046a7cb in ev_invoke_pending ()

#5 0x000000000046b6a5 in ev_run ()

#6 0x000000000041888b in ev_loop (flags=0, loop=0x257c160) at /usr/local/include/ev.h:835

#7 as_ev_worker (udata=0x257c160) at src/main/aerospike/as_event_ev.c:98

#8 0x00007ffadf73cdc5 in start_thread () from /lib64/libpthread.so.0

#9 0x00007ffadeb4773d in clone () from /lib64/libc.so.6

A new github C client branch “4.6.13.1” has been created with an important async retry fix.

Please try your application with this C client branch.

Hi Brian, Thank you very much for your quick response, and put out the fix in few days! I will give it a try.

I reduced the number of processes from 8 to 4. Still crashed. For each process, I set max_connections to 300. We have 6.3B records to insert. Crashed in the middle randomly.

I believe this is related to async_put retry. When cluster is empty, the “fresh” put operation went through smoothly. Never seen any problem with “fresh” put. The performance of fresh put is awesome, see 1M+ puts per second.

Problem happened only when PK exists, i.e. updating the existing records. Performance decreased 3~5 times, and crash frequently.

Program terminated with signal 11, Segmentation fault. #0 ck_pr_inc_32 (target=0x191) at modules/common/target/Linux-x86_64/include/aerospike/ck/gcc/x86_64/ck_pr.h:359 359 CK_PR_GENERATE(inc)

(gdb) bt #0 ck_pr_inc_32 (target=0x191) at modules/common/target/Linux-x86_64/include/aerospike/ck/gcc/x86_64/ck_pr.h:359

#1 as_node_reserve (node=0x191) at src/include/aerospike/as_node.h:477

#2 as_event_close_idle_connections_cb (event_loop=0x1d182a0, state=0x7f58e80008c0) at src/main/aerospike/as_event.c:1379

#3 0x0000000000418e0d in as_ev_wakeup (loop=, wakeup=, revents=) at src/main/aerospike/as_event_ev.c:81

#4 0x000000000046ab0e in ev_invoke_pending ()

#5 0x000000000046b9e8 in ev_run ()

#6 0x0000000000418b3b in ev_loop (flags=0, loop=0x1d24160) at /usr/local/include/ev.h:835

#7 as_ev_worker (udata=0x1d24160) at src/main/aerospike/as_event_ev.c:98

#8 0x00007f592002bdc5 in start_thread () from /lib64/libpthread.so.0

#9 0x00007f591f43673d in clone () from /lib64/libc.so.6 (gdb)

I changed my code a lot. The problem could not be reproduce any more. So don’t worry about it for now. Thanks for the help Brian!

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.