Error when one of the nodes try to load

Hey,

We have a 8 nodes cluster running in Kuberneties. One of our nodes has died and refused to load again.

Attach the log:

WARNING (drv_ssd): (drv_ssd.c:2639) /opt/aerospike/data/aerospike-release-5-storage.dat: found used wblock after skipping 1 unused
WARNING (particle): (particle_map.c:939) map_from_flat() invalid packed map
CRITICAL (drv_ssd): (drv_ssd.c:2515) particle replace failed
WARNING (as): (signal.c:213) SIGUSR1 received, aborting Aerospike Community Edition build 5.0.0.4 os debian9
WARNING (as): (log.c:604) stacktrace: registers: rax 0000000000000000 rbx 000000000000000a rcx 00007f27b3f81fbf rdx 0000000000000000 rsi 00007f276bdf69b0 rdi 0000000000000002 rbp 00007f276bdfac40 rsp 00007f276bdf6a28 r8 0000000000000000 r9 00007f276bdf69b0 r10 0000000000000008 r11 0000000000000246 r12 00007f27b0600000 r13 0000000000000001 r14 00007f26ac471cee r15 0000000000000000 rip 00007f27b3f81fbf
WARNING (as): (log.c:617) stacktrace: found 11 frames: 0x169c33 0xb91e9 0x7f27b3f820e0 0x7f27b3f81fbf 0x1695b6 0x1387a4 0x138b36 0x138e22 0x157060 0x7f27b3f784a4 0x7f27b2df7d0f offset 0x5614c6f7b000
WARNING (as): (log.c:627) stacktrace: frame 0: asd(cf_log_stack_trace+0xe8) [0x5614c70e4c33]
WARNING (as): (log.c:627) stacktrace: frame 1: asd(as_sig_handle_usr1+0x5f) [0x5614c70341e9]
WARNING (as): (log.c:627) stacktrace: frame 2: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0) [0x7f27b3f820e0]
WARNING (as): (log.c:627) stacktrace: frame 3: /lib/x86_64-linux-gnu/libpthread.so.0(raise+0xcf) [0x7f27b3f81fbf]
WARNING (as): (log.c:627) stacktrace: frame 4: asd(cf_log_write_no_return+0x86) [0x5614c70e45b6]
WARNING (as): (log.c:627) stacktrace: frame 5: asd(ssd_cold_start_add_record+0xdc4) [0x5614c70b37a4]
WARNING (as): (log.c:627) stacktrace: frame 6: asd(ssd_cold_start_sweep+0x2f6) [0x5614c70b3b36]
WARNING (as): (log.c:627) stacktrace: frame 7: asd(run_ssd_cold_start+0x82) [0x5614c70b3e22]
WARNING (as): (log.c:627) stacktrace: frame 8: asd(+0x157060) [0x5614c70d2060]
WARNING (as): (log.c:627) stacktrace: frame 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7f27b3f784a4]
WARNING (as): (log.c:627) stacktrace: frame 10: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f27b2df7d0f]
INFO (drv_ssd): (drv_ssd.c:3510) {storage} loaded: objects 6574952 device-pcts (9)
INFO (drv_ssd): (drv_ssd.c:3510) {storage} loaded: objects 6574952 device-pcts (9)
INFO (drv_ssd): (drv_ssd.c:3510) {storage} loaded: objects 6574952 device-pcts (9)
INFO (drv_ssd): (drv_ssd.c:3510) {storage} loaded: objects 6574952 device-pcts (9)
INFO (drv_ssd): (drv_ssd.c:3510) {storage} loaded: objects 6574952 device-pcts (9)

Can you please assist with that? Thanks

Try running the asvalidation tool to go after corrupted CDT bins (that may have been inserted in older versions).

I run this command but no issue found.

Output:

2021-05-04 09:44:32 GMT [INF] [  241] Starting 100% validation of aerospike-release.services.svc.cluster.local (namespace: storage, set: [all], bins: [all], after: [none], before: [none]) to rep.txt
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB9ACA48945B982 xx.xxx.16.121:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB9F2F0CF6984DA xx.xxx.46.137:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB978F2A4AA7B42 xx.xxx.36.62:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB950314CA5D48A xx.xxx.16.193:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB9869E11D2F32A xx.xxx.66.242:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB9615DD31F1476 xx.xxx.66.117:3000
2021-05-04 09:44:32 GMT [INF] [  241] [src/main/aerospike/as_cluster.c:195][as_cluster_add_nodes_copy] Add node BB942B546E319C6 xx.xxx.26.156:3000
2021-05-04 09:44:32 GMT [INF] [  241] Processing 7 node(s)
2021-05-04 09:44:32 GMT [INF] [  241] Node ID             Objects        Replication    
2021-05-04 09:44:32 GMT [INF] [  241] BB9ACA48945B982     4418521        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB9F2F0CF6984DA     7698564        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB978F2A4AA7B42     4651881        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB950314CA5D48A     4843411        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB9869E11D2F32A     5011704        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB9615DD31F1476     4834055        1              
2021-05-04 09:44:32 GMT [INF] [  241] BB942B546E319C6     4574060        1              
2021-05-04 09:44:32 GMT [INF] [  241] Namespace contains 36032196 record(s)
2021-05-04 09:44:32 GMT [INF] [  241] Created new output file rep.txt
2021-05-04 09:44:32 GMT [INF] [  265] Starting validation for node BB9ACA48945B982
2021-05-04 09:44:32 GMT [INF] [  266] Starting validation for node BB9F2F0CF6984DA
2021-05-04 09:44:32 GMT [INF] [  264] Starting validation for node BB978F2A4AA7B42
2021-05-04 09:44:32 GMT [INF] [  263] Starting validation for node BB950314CA5D48A
2021-05-04 09:44:32 GMT [INF] [  262] Starting validation for node BB9869E11D2F32A
2021-05-04 09:44:32 GMT [INF] [  261] Starting validation for node BB9615DD31F1476
2021-05-04 09:44:32 GMT [INF] [  260] Starting validation for node BB942B546E319C6
2021-05-04 09:44:33 GMT [INF] [  259] 6% complete (~2434717 rec/s)
2021-05-04 09:44:33 GMT [INF] [  259] ~13s remaining
2021-05-04 09:44:42 GMT [INF] [  260] Completed validation for node BB942B546E319C6, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:43 GMT [INF] [  259] 72% complete (~2165267 rec/s)
2021-05-04 09:44:43 GMT [INF] [  259] ~4s remaining
2021-05-04 09:44:44 GMT [INF] [  265] Completed validation for node BB9ACA48945B982, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:44 GMT [INF] [  264] Completed validation for node BB978F2A4AA7B42, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:45 GMT [INF] [  262] Completed validation for node BB9869E11D2F32A, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:45 GMT [INF] [  266] Completed validation for node BB9F2F0CF6984DA, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:46 GMT [INF] [  263] Completed validation for node BB950314CA5D48A, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:46 GMT [INF] [  261] Completed validation for node BB9615DD31F1476, records: 0, size: 0 (~0 B/rec)
2021-05-04 09:44:47 GMT [INF] [  259] Found 0 invalid record(s) from 7 node(s), 0 byte(s) in total (~0 B/rec)
2021-05-04 09:44:47 GMT [INF] [  259] CDT Mode: validate
2021-05-04 09:44:47 GMT [INF] [  259]          0 Lists
2021-05-04 09:44:47 GMT [INF] [  259]          0   Unfixable
2021-05-04 09:44:47 GMT [INF] [  259]          0     Has non-storage
2021-05-04 09:44:47 GMT [INF] [  259]          0     Corrupted
2021-05-04 09:44:47 GMT [INF] [  259]          0   Need Fix
2021-05-04 09:44:47 GMT [INF] [  259]          0     Fixed
2021-05-04 09:44:47 GMT [INF] [  259]          0     Fix failed
2021-05-04 09:44:47 GMT [INF] [  259]          0     Order
2021-05-04 09:44:47 GMT [INF] [  259]          0     Padding
2021-05-04 09:44:47 GMT [INF] [  259]   32974403 Maps
2021-05-04 09:44:47 GMT [INF] [  259]          0   Unfixable
2021-05-04 09:44:47 GMT [INF] [  259]          0     Has duplicate keys
2021-05-04 09:44:47 GMT [INF] [  259]          0     Has non-storage
2021-05-04 09:44:47 GMT [INF] [  259]          0     Corrupted
2021-05-04 09:44:47 GMT [INF] [  259]          0   Need Fix
2021-05-04 09:44:47 GMT [INF] [  259]          0     Fixed
2021-05-04 09:44:47 GMT [INF] [  259]          0     Fix failed
2021-05-04 09:44:47 GMT [INF] [  259]          0     Order
2021-05-04 09:44:47 GMT [INF] [  259]          0     Padding

Hum… interesting that it seems all the Maps are good. Was the node that had gone down up and running when this was run (was the asvalidation tool run against all nodes)?

BTW, I had missed the first WARNING in the log you shared:

WARNING (drv_ssd): (drv_ssd.c:2639) /opt/aerospike/data/aerospike-release-5-storage.dat: found used wblock after skipping 1 unused

Not sure if this could be related but definitely not something good… was the storage device previously used and not properly wiped out before being put in service again? Make sure you always properly clean up storage devices before re-using them (look for blkdiscard in the forum for details).

I solved this issue by deleting the dat file from the pod. /opt/aerospike/data/aerospike-release-5-storage.dat

Thanks for closing the loop!

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.