Cold restart failed - tombstone found in CE after reboot


#1

We’re using Aerospike Community Edition build 3.12.1 on debian 8. One node in cluster has been restarted via reboot command and then failed to restart with "community edition found tombstone - erase drive and restart " message.

Our config:

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 24
        transaction-queues 24
        transaction-threads-per-queue 24
        proto-fd-max 100000
        proto-fd-idle-ms 10000
}

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}


network {
        service {
                address any
                port 3000
        }

        heartbeat {
                mode mesh
                port 3002 # Heartbeat port for this node.

                # List one or more other nodes, one ip-address & port per line:
                mesh-seed-address-port 10.53.96.95 3002
                mesh-seed-address-port 10.53.96.96 3002
                mesh-seed-address-port 10.53.96.97 3002
                mesh-seed-address-port 10.53.96.98 3002
                mesh-seed-address-port 10.53.96.99 3002

        }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace abox {

        replication-factor 2
        memory-size 20G                         # Maximum memory allocation for primary
                                                # and secondary indexes.
        storage-engine device {                 # Configure the storage-engine to use persistence
                device /dev/sdb                 # raw device. Maximum size is 2 TiB
                write-block-size 128K           # adjust block size to make it efficient for SSDs.
        }
}

Log:

Aug 16 2017 13:16:56 GMT: WARNING (drv_ssd): (drv_ssd.c:3094) error: block extends over read size: foff 45828014080 boff 529408 blen 452985072 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [offset], ignoring record 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484)    ns->name = abox 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485)    bin 5 [of 7] 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48    ssd_bin->offset = 1229800513 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489)    ssd_bin->len = 37632 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490)    ssd_bin->next = 3840 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [offset], ignoring record 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484)    ns->name = abox 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485)    bin 4 [of 14] 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48    ssd_bin->offset = 1425833907 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489)    ssd_bin->len = 3773480322 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490)    ssd_bin->next = 1126 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [next ptr], ignoring record 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484)    ns->name = abox 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485)    bin 11 [of 19] 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48    ssd_bin->offset = 1288 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489)    ssd_bin->len = 0 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490)    ssd_bin->next = 0 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:483) untrustworthy data from disk [next ptr], ignoring record 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:484)    ns->name = abox 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:485)    bin 45 [of 84] 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:48    ssd_bin->offset = 2056 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:489)    ssd_bin->len = 0 
Aug 16 2017 13:16:56 GMT: INFO (drv_ssd): (drv_ssd.c:490)    ssd_bin->next = 0 
Aug 16 2017 13:16:56 GMT: FAILED ASSERTION (drv_ssd): (drv_ssd_ce.c:45) community edition found tombstone - erase drive and restart 
Aug 16 2017 13:16:56 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 3.12.1 os debian8 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: found 10 frames 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x31) [0x485137] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f8b358cb0e0] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 2: /lib/x86_64-linux-gnu/libpthread.so.0(raise+0x2b) [0x7f8b36a9975b] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 3: /usr/bin/asd(cf_fault_event+0x233) [0x5244dd] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 4: /usr/bin/asd(ssd_cold_start_is_valid_n_bins+0x25) [0x5077d6] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 5: /usr/bin/asd(ssd_record_add+0x8a) [0x502feb] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 6: /usr/bin/asd(ssd_load_device_sweep+0x363) [0x504453] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 7: /usr/bin/asd(ssd_load_devices_fn+0xa0) [0x504655] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f8b36a92064] 
Aug 16 2017 13:16:56 GMT: INFO (as): (signal.c:214) call stack: frame 9: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f8b3597e62d]

Is anything wrong in our config? Rebooting server shouldn’t corrupt the partition I think. Thanks…


#2
  1. Is this the first restart since upgrading to 3.12?

  2. What version were you running prior to 3.12?


#3

It wasn’t first restart and 3.12 was installed on fresh machine (so no upgrade).


#4
  1. Was there any data on the disk prior to running with Aerospike?

  2. Did you zeroize the disk?

    How to Add, Replace, & Remove disks

  3. Is this disk mounted?

    cat /proc/mounts
    

#5

Kevin, You have certainly put your finger on the problem. I’ve revised the way how operational dpt has installed the machine and they’ve omitted zeroing the disk.

So, big thanks and big sorry - I should’ve checked it before asking this question.


#6

This is still unlikely, but not impossible, to cause this issue. Have you run Aerospike on these devices before (possibly with a different write-block-size)?

In the past, the ways I’ve seen disk corruption are:

  1. Disk was actually failing: Try running badblocks and checking S.M.A.R.T. diagnostics.
  2. Using the same device in multiple namespaces.
  3. Having multiple Aerospike instances using the same device (docker was involved).
  4. Not zeroizing disks that had Aerospike data on them and either had larger write-block-size or were single bin before.

#7

I’ve interviewed guys from operations. The machines (and disks) has been reused after very disk intensive application without zeroizing. The cluster has 5 nodes, this behavior occurred only on one of them. So I would say that it was only bad luck… (They swear that disks are healthy, used only by one namespace and one aerospike instance). Sorry again I should have been more mistrustful, when I was able to google the error message only in source code :wink: