Asd 3.6.3 crashed with ssd


#1
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f94944fa7bd]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7f949526bb50]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 10: /opt/tiger/aerospike-deploy/bin/asd(thr_tsvc+0x1b) [0x4c871c]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 9: /opt/tiger/aerospike-deploy/bin/asd(process_transaction+0xeec) [0x4c854d]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 8: /opt/tiger/aerospike-deploy/bin/asd(as_read_start+0x93) [0x4c3e5c]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 7: /opt/tiger/aerospike-deploy/bin/asd(read_local+0x70c) [0x4c0014]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 6: /opt/tiger/aerospike-deploy/bin/asd(as_storage_record_close_ssd+0x36) [0x5042ee]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 5: /opt/tiger/aerospike-deploy/bin/asd(cf_free_at+0x19) [0x5066f1]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 4: /opt/tiger/aerospike-deploy/bin/asd() [0x59b3e9]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 3: /opt/tiger/aerospike-deploy/bin/asd() [0x5979ff]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 2: /opt/tiger/aerospike-deploy/bin/asd() [0x59750b]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x321e0) [0x7f94944501e0]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: frame 0: /opt/tiger/aerospike-deploy/bin/asd(as_sig_handle_segv+0x5d) [0x47d9ac]
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::163) stacktrace: found 13 frames
Oct 23 2015 11:40:58 GMT: WARNING (as): (signal.c::161) SIGSEGV received, aborting Aerospike Community Edition build 3.6.3 os debian7

#2

There is another issue.

Oct 28 2015 11:42:34 GMT: WARNING (demarshal): (thr_demarshal.c::354) dropping incoming client connection: hit limit 15001 connections

I inspected the related /proc//fd, but there aren’t so many fd opened. I think the statistics data is wrong.

And, the above issue maybe encountered by me again, although the appearance isn’t the same: Many threads hogged 100% CPU to malloc and free, but did nothing.


#3

@xiaosuo,

What was happening up to the crash? Specifically, are you making use of batch-reads, scans or queries?

It would be good to get your aerospike.conf (the config file) and information about which client release you’re using. Looks like you’re working with server version 3.6.3. and Debian 7 as your OS.

Regarding your latest error, looks like you’ve hit proto-fd-max, which is the maximum number of open file descriptors opened on behalf of client connections. The default value is 15000, but you can increase it by using the following command (this example shows it being increased to 100000):

 asinfo -v 'set-config:context=service;proto-fd-max=100000'

More information is available on this page.


#4

Yes. I was doing batch-reads at the time.

Client: client-c, 3.1.20

Server: 3.6.3

OS: debian 7

I don’t think I hit proto-fd-max, but fd leak instead, because there aren’t so many FDs in /proc//fd.

Thank you.


#5

@xiaosuo,

We just came out Aerospike Server CE 3.6.4, which fixes batch-related node crashes. Will you please upgrade to this version and let us know whether you’re still experiencing this issue?


#6

We are trying. Thanks