Aql segmentation fault when receiving large result set

dlhero · November 20, 2014, 1:51pm

Hello,

I’m trying to do a rather large query on index in my sample database. I have 40Million Entries and a distinct number of values for the index I’m querying. My namespace is running in a memory only configuration with no persistency. I’m with aerospike version 3.3.21 on Red Hat 6.4 (x86_64).

# gdb aql
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
(gdb) r
Starting program: /usr/bin/aql 
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7de5700 (LWP 27497)]
Aerospike Query
Copyright 2013 Aerospike. All rights reserved.

aql> show indexes
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
| ns     | bins         | set     | num_bins | state | indexname   | sync_state | type         |
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
| "test" | "filesize"   | "demo2" | 1        | "RW"  | "fsize2"    | "synced"   | "INT SIGNED" |
| "test" | "originalId" | "demo2" | 1        | "RW"  | "original"  | "synced"   | "TEXT"       |
| "test" | "stamp"      | "demo2" | 1        | "RW"  | "timestamp" | "synced"   | "INT SIGNED" |
+--------+--------------+---------+----------+-------+-------------+------------+--------------+
3 rows in set (0.001 secs)

aql> select stamp from test.demo2  where stamp between 1414885153 and 1420000000;
         <snip large outputs>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff0083700 (LWP 27535)]
0x0000003c532747fa in _IO_default_xsputn_internal () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install aerospike-tools-3.3.22-1.el6.x86_64
(gdb) bt
#0  0x0000003c532747fa in _IO_default_xsputn_internal () from /lib64/libc.so.6
#1  0x0000003c532443a9 in vfprintf () from /lib64/libc.so.6
#2  0x0000003c53269889 in vsprintf () from /lib64/libc.so.6
#3  0x0000003c5324f538 in sprintf () from /lib64/libc.so.6
#4  0x0000000000472277 in as_integer_val_tostring (v=<value optimized out>) at src/main/aerospike/as_integer.c:86
#5  0x0000000000461c4c in each_bin (name=0x7fffef684aa0 "stamp", val=0x7fffef684ab0, udata=0x82fb90) at src/main/renderer/table.c:149
#6  0x000000000046e758 in as_record_foreach (rec=0x7ffff0082c30, callback=0x461aaa <each_bin>, udata=0x82fb90) at src/main/aerospike/as_record.c:520
#7  0x0000000000461a6a in as_rec_foreach (rec=0x7ffff0082c30, callback=0x461aaa <each_bin>, udata=0x82fb90) at /home/citrusleaf/BUILD/aerospike-client-c/modules/common/src/include/aerospike/as_rec.h:632
#8  0x0000000000462558 in render (val=0x7ffff0082c30, view=0x82fb90) at src/main/renderer/table.c:313
#9  0x000000000046270a in citrusleaf_query_foreach_callback (v=<value optimized out>, udata=<value optimized out>) at src/main/citrusleaf/cl_query.c:1451
#10 0x00000000004632d8 in cl_query_worker_do (node=0x7f03e0, task=<value optimized out>) at src/main/citrusleaf/cl_query.c:940
#11 0x0000000000463360 in cl_query_worker (pv_asc=0x7f0150) at src/main/citrusleaf/cl_query.c:1023
#12 0x0000003c53607851 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003c532e890d in clone () from /lib64/libc.so.6

Is this normal? I have isolated the same issue using the C client library as well by customizing an example you ship with the library.

Does this indicate a limit on the numver of items I can fetch at any given time or is it bug regarding data type conversion from integer to string when you pass the val to sprintf in frame 3 (merely just a display problem)?

Best Regards, Leonidas Tsampros

BhuvanRamK · November 22, 2014, 2:32am

Hi Leonidas,

Thanks for bringing this to our attention.

I’m sorry to note that you encountered this crash while running a range-query via aql. As you have already observed, the issue is not with the tool, but with the client. It looks like a potential stack-corruption and not just a display bug. I’ve filed a ticket for this problem. We’ll let you know as soon as this gets fixed and is available.

Sincerely, Bhuvana

dlhero · November 24, 2014, 9:27am

Hello Bhuvan,

Thanks for letting me know.

Looking forward to a fix.

Best Regards Leonidas Tsampros

BhuvanRamK · November 24, 2014, 8:40pm

Hi Leonidas,

Which C-client example did you tweak ? and what was the specific tweak that caused the crash ?

Thanks Bhuvana

dlhero · November 25, 2014, 11:01am

Hello Bhuvana,

I modified the example under “examples/query_examples/simple/src”. The modified file is here:

gist.github.com

https://gist.github.com/ltsampros/1db4059ab6fe01855ed7

modified-simple-example.c

/*******************************************************************************
 * Copyright 2008-2013 by Aerospike.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to
 * deal in the Software without restriction, including without limitation the
 * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 * sell copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *

This file has been truncated. show original

Mainly I did the following changes:

Removed the parts with index creation, record insertion etc etc.
Removed the cleanup function
Added as_query_select() to choose which bins to get from each record.
Modified predicate to integer_range()
Removed dumping of records from query_callback.

I noticed that the crash is a different though:

#0  0x000000000040c192 in as_record_bin_forupdate (rec=0x7ffff50d2c90, name=0x7ffff50cd990 "stamp") at src/main/aerospike/as_record.c:101
#1  0x000000000040c5b0 in as_record_set_int64 (rec=<optimized out>, name=0x7ffff50cd990 "stamp", value=1414136640) at src/main/aerospike/as_record.c:239
#2  0x0000000000424ed5 in clbin_to_asrecord (bin=0x7ffff50cd990, r=0x7ffff50d2c90) at src/main/aerospike/_shim.c:298
#3  0x0000000000424f28 in clbins_to_asrecord (bins=0x7ffff50cd990, nbins=<optimized out>, r=0x7ffff50d2c90) at src/main/aerospike/_shim.c:338
#4  0x000000000042019f in cl_query_worker_do (node=0x641390, task=0x7ffff50d2e60) at src/main/citrusleaf/cl_query.c:860
#5  0x0000000000420494 in cl_query_worker (pv_asc=0x641010) at src/main/citrusleaf/cl_query.c:1018
#6  0x00007ffff77e9e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007ffff6ecaccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x0000000000000000 in ?? ()

Sorry for the confusion.

Best Regards, Leonidas Tsampros

BhuvanRamK · November 25, 2014, 12:43pm

Thanks for the update. I’ll follow-up on this.

dlhero · November 25, 2014, 4:56pm

By the way I was able to reproduce the same error using a custom stream udf.

aql> aggregate query.my_stream_udf() on test.demo2 where stamp between 0 and 1411827150;
[New Thread 0x7ffff1485700 (LWP 7581)]
[New Thread 0x7ffff0a84700 (LWP 7582)]
[New Thread 0x7ffff0083700 (LWP 7583)]
[New Thread 0x7fffef682700 (LWP 7584)]
[New Thread 0x7fffeec81700 (LWP 7585)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff0083700 (LWP 7583)]
0x0000003c5327611c in malloc_consolidate () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c5327611c in malloc_consolidate () from /lib64/libc.so.6
#1  0x0000003c532797cb in _int_malloc () from /lib64/libc.so.6
#2  0x0000003c5327a911 in malloc () from /lib64/libc.so.6
#3  0x0000000000472214 in as_integer_new (value=278929695) at src/main/aerospike/as_integer.c:59
#4  0x0000000000487de1 in as_unpack_integer (pk=<value optimized out>, val=0x7fffef684130) at src/main/aerospike/as_msgpack.c:407
#5  as_unpack_val (pk=<value optimized out>, val=0x7fffef684130) at src/main/aerospike/as_msgpack.c:520
#6  0x0000000000487fc5 in as_unpack_map (pk=0x7fffef684180, size=2, val=0x7fffef6841d8) at src/main/aerospike/as_msgpack.c:457
#7  0x0000000000472a9e in as_msgpack_serializer_deserialize (s=<value optimized out>, buff=<value optimized out>, v=<value optimized out>) at src/main/aerospike/as_msgpack_serializer.c:146
#8  0x00000000004659bf in as_serializer_deserialize (bin=0x7ffff0081970, r=0x7ffff0082c30) at /home/citrusleaf/BUILD/aerospike-client-c/modules/common/target/Linux-x86_64/include/aerospike/as_serializer.h:87
#9  clbin_to_asrecord (bin=0x7ffff0081970, r=0x7ffff0082c30) at src/main/aerospike/_shim.c:319
#10 0x0000000000465a62 in clbins_to_asrecord (bins=<value optimized out>, nbins=<value optimized out>, r=0x7ffff0082c30) at src/main/aerospike/_shim.c:339
#11 0x0000000000463075 in cl_query_worker_do (node=0x7f03e0, task=<value optimized out>) at src/main/citrusleaf/cl_query.c:865
#12 0x0000000000463360 in cl_query_worker (pv_asc=0x7f0150) at src/main/citrusleaf/cl_query.c:1023
#13 0x0000003c53607851 in start_thread () from /lib64/libpthread.so.0
#14 0x0000003c532e890d in clone () from /lib64/libc.so.6

I think it’s relevant to previous segfaults since this problem only happens when I use large ranges. If I decrease the query range the issue is not reproduced.

blitzkreig · January 5, 2015, 5:23pm

Hi, Any updates on this?

BhuvanRamK · January 5, 2015, 8:01pm

Hi Leonidas,

I’m sorry about the delay. I’m not able to confirm that we have a fix for this issue, because I have not been able to reproduce it with the exact same back-trace.

Can you let me know what is the range of the large set result you are parsing ? How many entries exist and how many do you expect to get back for the range query ?

In the meanwhile, we have released a recent C-client with a potential fix for crashes when parsing large result sets.

Can you try this and let me know if you still see the crash ?

Sincerely, Bhuvana

Topic		Replies	Views
Random SIGSEGV when doing around 104600 to 104700 queries C Client Library	6	2289	March 17, 2016
Program terminated with signal 11, Segmentation fault C Client Library	5	4843	August 25, 2014
SIGSEGV received, aborting (3.5.4 CE) running LUA	3	1875	April 13, 2015
Aerospike server will crash if match more than 7 records when Aggregate User Defined Functions (UDF)	8	3010	March 12, 2015
Core dump when i use Aerospike::query PHP Client Library	5	2062	March 12, 2015

Aql segmentation fault when receiving large result set

Related topics