I’m trying to do a rather large query on index in my sample database. I have 40Million Entries and a distinct number of values for the index I’m querying. My namespace is running in a memory only configuration with no persistency. I’m with aerospike version 3.3.21 on Red Hat 6.4 (x86_64).
# gdb aql
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
(gdb) r
Starting program: /usr/bin/aql
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7de5700 (LWP 27497)]
Aerospike Query
Copyright 2013 Aerospike. All rights reserved.
aql> show indexes
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
| ns | bins | set | num_bins | state | indexname | sync_state | type |
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
| "test" | "filesize" | "demo2" | 1 | "RW" | "fsize2" | "synced" | "INT SIGNED" |
| "test" | "originalId" | "demo2" | 1 | "RW" | "original" | "synced" | "TEXT" |
| "test" | "stamp" | "demo2" | 1 | "RW" | "timestamp" | "synced" | "INT SIGNED" |
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
3 rows in set (0.001 secs)
aql> select stamp from test.demo2 where stamp between 1414885153 and 1420000000;
<snip large outputs>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff0083700 (LWP 27535)]
0x0000003c532747fa in _IO_default_xsputn_internal () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install aerospike-tools-3.3.22-1.el6.x86_64
(gdb) bt
#0 0x0000003c532747fa in _IO_default_xsputn_internal () from /lib64/libc.so.6
#1 0x0000003c532443a9 in vfprintf () from /lib64/libc.so.6
#2 0x0000003c53269889 in vsprintf () from /lib64/libc.so.6
#3 0x0000003c5324f538 in sprintf () from /lib64/libc.so.6
#4 0x0000000000472277 in as_integer_val_tostring (v=<value optimized out>) at src/main/aerospike/as_integer.c:86
#5 0x0000000000461c4c in each_bin (name=0x7fffef684aa0 "stamp", val=0x7fffef684ab0, udata=0x82fb90) at src/main/renderer/table.c:149
#6 0x000000000046e758 in as_record_foreach (rec=0x7ffff0082c30, callback=0x461aaa <each_bin>, udata=0x82fb90) at src/main/aerospike/as_record.c:520
#7 0x0000000000461a6a in as_rec_foreach (rec=0x7ffff0082c30, callback=0x461aaa <each_bin>, udata=0x82fb90) at /home/citrusleaf/BUILD/aerospike-client-c/modules/common/src/include/aerospike/as_rec.h:632
#8 0x0000000000462558 in render (val=0x7ffff0082c30, view=0x82fb90) at src/main/renderer/table.c:313
#9 0x000000000046270a in citrusleaf_query_foreach_callback (v=<value optimized out>, udata=<value optimized out>) at src/main/citrusleaf/cl_query.c:1451
#10 0x00000000004632d8 in cl_query_worker_do (node=0x7f03e0, task=<value optimized out>) at src/main/citrusleaf/cl_query.c:940
#11 0x0000000000463360 in cl_query_worker (pv_asc=0x7f0150) at src/main/citrusleaf/cl_query.c:1023
#12 0x0000003c53607851 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003c532e890d in clone () from /lib64/libc.so.6
Is this normal? I have isolated the same issue using the C client library as well by customizing an example you ship with the library.
Does this indicate a limit on the numver of items I can fetch at any given time or is it bug regarding data type conversion from integer to string when you pass the val to sprintf in frame 3 (merely just a display problem)?
I’m sorry to note that you encountered this crash while running a range-query via aql. As you have already observed, the issue is not with the tool, but with the client. It looks like a potential stack-corruption and not just a display bug. I’ve filed a ticket for this problem. We’ll let you know as soon as this gets fixed and is available.
I modified the example under “examples/query_examples/simple/src”. The modified file is here:
Mainly I did the following changes:
Removed the parts with index creation, record insertion etc etc.
Removed the cleanup function
Added as_query_select() to choose which bins to get from each record.
Modified predicate to integer_range()
Removed dumping of records from query_callback.
I noticed that the crash is a different though:
#0 0x000000000040c192 in as_record_bin_forupdate (rec=0x7ffff50d2c90, name=0x7ffff50cd990 "stamp") at src/main/aerospike/as_record.c:101
#1 0x000000000040c5b0 in as_record_set_int64 (rec=<optimized out>, name=0x7ffff50cd990 "stamp", value=1414136640) at src/main/aerospike/as_record.c:239
#2 0x0000000000424ed5 in clbin_to_asrecord (bin=0x7ffff50cd990, r=0x7ffff50d2c90) at src/main/aerospike/_shim.c:298
#3 0x0000000000424f28 in clbins_to_asrecord (bins=0x7ffff50cd990, nbins=<optimized out>, r=0x7ffff50d2c90) at src/main/aerospike/_shim.c:338
#4 0x000000000042019f in cl_query_worker_do (node=0x641390, task=0x7ffff50d2e60) at src/main/citrusleaf/cl_query.c:860
#5 0x0000000000420494 in cl_query_worker (pv_asc=0x641010) at src/main/citrusleaf/cl_query.c:1018
#6 0x00007ffff77e9e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007ffff6ecaccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x0000000000000000 in ?? ()
By the way I was able to reproduce the same error using a custom stream udf.
aql> aggregate query.my_stream_udf() on test.demo2 where stamp between 0 and 1411827150;
[New Thread 0x7ffff1485700 (LWP 7581)]
[New Thread 0x7ffff0a84700 (LWP 7582)]
[New Thread 0x7ffff0083700 (LWP 7583)]
[New Thread 0x7fffef682700 (LWP 7584)]
[New Thread 0x7fffeec81700 (LWP 7585)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff0083700 (LWP 7583)]
0x0000003c5327611c in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003c5327611c in malloc_consolidate () from /lib64/libc.so.6
#1 0x0000003c532797cb in _int_malloc () from /lib64/libc.so.6
#2 0x0000003c5327a911 in malloc () from /lib64/libc.so.6
#3 0x0000000000472214 in as_integer_new (value=278929695) at src/main/aerospike/as_integer.c:59
#4 0x0000000000487de1 in as_unpack_integer (pk=<value optimized out>, val=0x7fffef684130) at src/main/aerospike/as_msgpack.c:407
#5 as_unpack_val (pk=<value optimized out>, val=0x7fffef684130) at src/main/aerospike/as_msgpack.c:520
#6 0x0000000000487fc5 in as_unpack_map (pk=0x7fffef684180, size=2, val=0x7fffef6841d8) at src/main/aerospike/as_msgpack.c:457
#7 0x0000000000472a9e in as_msgpack_serializer_deserialize (s=<value optimized out>, buff=<value optimized out>, v=<value optimized out>) at src/main/aerospike/as_msgpack_serializer.c:146
#8 0x00000000004659bf in as_serializer_deserialize (bin=0x7ffff0081970, r=0x7ffff0082c30) at /home/citrusleaf/BUILD/aerospike-client-c/modules/common/target/Linux-x86_64/include/aerospike/as_serializer.h:87
#9 clbin_to_asrecord (bin=0x7ffff0081970, r=0x7ffff0082c30) at src/main/aerospike/_shim.c:319
#10 0x0000000000465a62 in clbins_to_asrecord (bins=<value optimized out>, nbins=<value optimized out>, r=0x7ffff0082c30) at src/main/aerospike/_shim.c:339
#11 0x0000000000463075 in cl_query_worker_do (node=0x7f03e0, task=<value optimized out>) at src/main/citrusleaf/cl_query.c:865
#12 0x0000000000463360 in cl_query_worker (pv_asc=0x7f0150) at src/main/citrusleaf/cl_query.c:1023
#13 0x0000003c53607851 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003c532e890d in clone () from /lib64/libc.so.6
I think it’s relevant to previous segfaults since this problem only happens when I use large ranges. If I decrease the query range the issue is not reproduced.
I’m sorry about the delay. I’m not able to confirm that we have a fix for this issue, because I have not been able to reproduce it with the exact same back-trace.
Can you let me know what is the range of the large set result you are parsing ? How many entries exist and how many do you expect to get back for the range query ?
In the meanwhile, we have released a recent C-client with a potential fix for crashes when parsing large result sets.
Can you try this and let me know if you still see the crash ?