Aerospile cold start fail

Hi, I have a problem with the aerospike cold start.

I don’t see any useful information, why does this error occur?

Sep 08 2021 09:19:48 GMT: INFO (config): (cfg.c:3967) system file descriptor limit: 100000, proto-fd-max: 15000
Sep 08 2021 09:19:48 GMT: INFO (hardware): (hardware.c:1987) detected 160 CPU(s), 80 core(s), 4 NUMA node(s)
Sep 08 2021 09:19:48 GMT: INFO (socket): (socket.c:2676) Node port 3001, node ID bb998b0d080dadc
Sep 08 2021 09:19:48 GMT: INFO (config): (cfg.c:4010) node-id bb998b0d080dadc
Sep 08 2021 09:19:48 GMT: INFO (namespace): (namespace_ce.c:87) {ns1} beginning cold start
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:3521) opened file /home/admin/mw/aerospike/data/ns1.dat: usable size 10737418240
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:1073) /home/admin/mw/aerospike/data/ns1.dat has 10240 wblocks of size 1048576
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:3215) {ns1} device /home/admin/mw/aerospike/data/ns1.dat prior shutdown not clean
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:3007) device /home/admin/mw/aerospike/data/ns1.dat: reading device to load index
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:3016) device /home/admin/mw/aerospike/data/ns1.dat: read complete: UNIQUE 0 (REPLACED 0) (OLDER 0) (EXPIRED 0) (MAX-TTL 0) records
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:1038) {ns1} loading free & defrag queues
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:977) /home/admin/mw/aerospike/data/ns1.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:1062) /home/admin/mw/aerospike/data/ns1.dat init wblock free-q 10232, defrag-q 0
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:2364) {ns1} starting device maintenance threads
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:1627) {ns1} starting write worker threads
Sep 08 2021 09:19:48 GMT: INFO (drv_ssd): (drv_ssd.c:887) {ns1} starting defrag threads
Sep 08 2021 09:19:48 GMT: INFO (as): (as.c:379) initializing services...
Sep 08 2021 09:19:48 GMT: INFO (tsvc): (thr_tsvc.c:136) 160 transaction queues: starting 4 threads per queue
Sep 08 2021 09:19:48 GMT: INFO (hb): (hb.c:6776) added new mesh seed 10.1.1.51:3002
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:803) updated fabric published address list to {10.1.1.51:3001}
Sep 08 2021 09:19:48 GMT: INFO (partition): (partition_balance.c:200) {ns1} 4096 partitions: found 4096 absent, 0 stored
Sep 08 2021 09:19:48 GMT: INFO (hb): (hb.c:5505) updated heartbeat published address list to {10.1.1.51:3002}
Sep 08 2021 09:19:48 GMT: INFO (batch): (batch.c:732) starting 160 batch-index-threads
Sep 08 2021 09:19:48 GMT: INFO (batch): (thr_batch.c:373) starting 4 batch-threads
Sep 08 2021 09:19:48 GMT: INFO (health): (health.c:320) starting health monitor thread
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:453) starting 8 fabric send threads
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:470) starting 16 fabric rw channel recv threads
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:470) starting 4 fabric ctrl channel recv threads
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:470) starting 4 fabric bulk channel recv threads
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:470) starting 4 fabric meta channel recv threads
Sep 08 2021 09:19:48 GMT: INFO (fabric): (fabric.c:476) starting fabric accept thread
Sep 08 2021 09:19:48 GMT: INFO (hb): (hb.c:7018) initializing mesh heartbeat socket: 10.1.1.51:3002
Sep 08 2021 09:19:48 GMT: INFO (fabric): (socket.c:702) Started fabric endpoint 0.0.0.0:3001
Sep 08 2021 09:19:48 GMT: INFO (hb): (hb.c:7047) mtu of the network is 1500
Sep 08 2021 09:19:48 GMT: INFO (hb): (socket.c:702) Started mesh heartbeat endpoint 10.1.1.51:3002
Sep 08 2021 09:19:48 GMT: INFO (nsup): (thr_nsup.c:1103) starting namespace supervisor threads
Sep 08 2021 09:19:48 GMT: INFO (demarshal): (thr_demarshal.c:886) starting 160 demarshal threads
Sep 08 2021 09:19:48 GMT: INFO (demarshal): (socket.c:702) Started client endpoint 0.0.0.0:3000
Sep 08 2021 09:19:48 GMT: INFO (info-port): (thr_info_port.c:300) starting info port thread
Sep 08 2021 09:19:48 GMT: INFO (info-port): (socket.c:702) Started info endpoint 0.0.0.0:3003
Sep 08 2021 09:19:48 GMT: INFO (as): (as.c:424) service ready: soon there will be cake!
Sep 08 2021 09:19:49 GMT: INFO (hb): (hb.c:6328) removing self seed entry host:10.1.1.51 port:3002
Sep 08 2021 09:19:49 GMT: INFO (hb): (hb.c:5704) removed mesh seed host:10.1.1.51 port 3002
Sep 08 2021 09:19:49 GMT: INFO (hb): (hb.c:4334) found redundant connections to same node, fds 146 143 - choosing at random
Sep 08 2021 09:19:49 GMT: INFO (info): (thr_info.c:4215) Aerospike Telemetry Agent: Aerospike anonymous data collection is ACTIVE. For further information, see http://aerospike.com/aerospike-telemetry
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:184) SIGSEGV received, aborting Aerospike Community Edition build 4.3.1.4 os el7
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: registers: rax 00000000000000a0 rbx 0000000000000000 rcx 0000000000020000 rdx 000000000000000a rsi 00007fd6d1150fd7 rdi 00007fdc5d61a984 rbp 00007fdc5d61a984 rsp 00007fd6d1150ca0 r8 0000000000000000 r9 0000000000000002 r10 0000000000000000 r11 00007fd85dcdaf10 r12 0000000000000000 r13 00007fd5fca0d132 r14 00007fd6d1150e00 r15 00007fd85d6201e0 rip 0000000000564d0b
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: found 9 frames: 0x4903f9 0x7fd85ed6b370 0x564d0b 0x4c8f81 0x4aac2b 0x4ad420 0x4ad8b4 0x7fd85ed63dc5 0x7fd85dc5073d offset 0x400000
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_segv+0x115) [0x4903f9]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 1: /lib64/libpthread.so.0(+0xf370) [0x7fd85ed6b370]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 2: /usr/bin/asd(cf_queue_sz+0x7) [0x564d0b]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 3: /usr/bin/asd(as_tsvc_queue_get_size+0x21) [0x4c8f81]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 4: /usr/bin/asd(info_get_stats+0x1b3) [0x4aac2b]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 5: /usr/bin/asd(info_some+0x235) [0x4ad420]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 6: /usr/bin/asd(thr_info_fn+0x1c8) [0x4ad8b4]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 7: /lib64/libpthread.so.0(+0x7dc5) [0x7fd85ed63dc5]
Sep 08 2021 09:19:49 GMT: WARNING (as): (signal.c:186) stacktrace: frame 8: /lib64/libc.so.6(clone+0x6d) [0x7fd85dc5073d]
addr2line -fie /usr/bin/asd 0x4903f9 0x7fd85ed6b370 0x564d0b 0x4c8f81 0x4aac2b 0x4ad420 0x4ad8b4 0x7fd85ed63dc5 0x7fd85dc5073d
as_sig_handle_segv
/work/source/as/src/base/signal.c:186
??
??:0
cf_queue_lock
/work/source/modules/common/src/main/citrusleaf/cf_queue.c:98
cf_queue_sz
/work/source/modules/common/src/main/citrusleaf/cf_queue.c:114
as_tsvc_queue_get_size
/work/source/as/src/base/thr_tsvc.c:198 (discriminator 2)
info_get_stats
/work/source/as/src/base/thr_info.c:342
info_some
/work/source/as/src/base/thr_info.c:4360
thr_info_fn
/work/source/as/src/base/thr_info.c:4497
??
??:0
??
??:0

What type of node are you running on? Do you know if it has ECC memory?

Is this crash reproducible?

Thank you for your help.

This machine uses reg ECC for memory.

I switched to another server later, and this error did not appear again.

We couldn’t identify a way for the pointer to the transaction queue to be NULL or invalid, which appears to be what happened in that stack. That said, we have removed these queues in newer versions of Aerospike, so if this were a bug with those queues then it was likely “fixed” by their removal. If this happens again, I’d suggest upgrading to the latest - 5.7 should be released soon.

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.