We’re using Aerospike Community Edition build 3.12.1 on debian 8. One node in cluster has been restarted via reboot command and then failed to restart with "community edition found tombstone - erase drive and restart " message.
Our config:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 24
transaction-queues 24
transaction-threads-per-queue 24
proto-fd-max 100000
proto-fd-idle-ms 10000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 10.53.96.95 3002
mesh-seed-address-port 10.53.96.96 3002
mesh-seed-address-port 10.53.96.97 3002
mesh-seed-address-port 10.53.96.98 3002
mesh-seed-address-port 10.53.96.99 3002
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace abox {
replication-factor 2
memory-size 20G # Maximum memory allocation for primary
# and secondary indexes.
storage-engine device { # Configure the storage-engine to use persistence
device /dev/sdb # raw device. Maximum size is 2 TiB
write-block-size 128K # adjust block size to make it efficient for SSDs.
}
}
Log:
Aug 16 2017 13:16:56 GMT: WARNING (drv_ssd): (drv_ssd.c:3094) error: block extends over read size: foff 45828014080 boff 529408 blen 452985072
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [offset], ignoring record
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484) ns->name = abox
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485) bin 5 [of 7]
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48 ssd_bin->offset = 1229800513
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489) ssd_bin->len = 37632
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490) ssd_bin->next = 3840
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [offset], ignoring record
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484) ns->name = abox
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485) bin 4 [of 14]
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48 ssd_bin->offset = 1425833907
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489) ssd_bin->len = 3773480322
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490) ssd_bin->next = 1126
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [next ptr], ignoring record
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484) ns->name = abox
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485) bin 11 [of 19]
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48 ssd_bin->offset = 1288
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489) ssd_bin->len = 0
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490) ssd_bin->next = 0
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [next ptr], ignoring record
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484) ns->name = abox
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485) bin 45 [of 84]
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48 ssd_bin->offset = 2056
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489) ssd_bin->len = 0
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490) ssd_bin->next = 0
Aug 16 2017 13:16:56 GMT: FAILED ASSERTION (drv_ssd): (drv_ssd_ce.c:45) community edition found tombstone - erase drive and restart
Aug 16 2017 13:16:56 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 3.12.1 os debian8
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: found 10 frames
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x31) [0x485137]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f8b358cb0e0]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 2: /lib/x86_64-linux-gnu/libpthread.so.0(raise+0x2b) [0x7f8b36a9975b]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 3: /usr/bin/asd(cf_fault_event+0x233) [0x5244dd]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 4: /usr/bin/asd(ssd_cold_start_is_valid_n_bins+0x25) [0x5077d6]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 5: /usr/bin/asd(ssd_record_add+0x8a) [0x502feb]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 6: /usr/bin/asd(ssd_load_device_sweep+0x363) [0x504453]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 7: /usr/bin/asd(ssd_load_devices_fn+0xa0) [0x504655]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f8b36a92064]
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 9: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8b3597e62d]
Is anything wrong in our config? Rebooting server shouldn’t corrupt the partition I think. Thanks…