CRITICAL: drive set with unmatched headers

Hello,

My company is currently evaluating Aerospike as a replacement to our current existing SQL database. I’m currently testing the various error scenario that we might encountered, and I’ve currently have problems regarding testing aerospike capability if a drive failure happens.

I put up a simple machine with the following specification :

  • Core 2 Duo E7500 (ancient, I know)
  • 2x 4GB DDR 1333MHz
  • 2x Corsair Force F240 SSD
  • Ubuntu 15.04
  • Aerospike Community 3.6.2

At first, I only used one SSD and created a namespace with the following configuration :

namespace test {
    replication-factor 2
    memory-size 6G
    default-ttl 0
    storage-engine device {
        device /dev/sdb
        write-block-size 128K
    }
}

Then I do some read & write testing. So far it’s running good. After that, I add another SSD and changed the configuration to :

namespace test {
    replication-factor 2
    memory-size 6G
    default-ttl 0
    storage-engine device {
        device /dev/sdb
        device /dev/sdc
        write-block-size 128K
    }
}

And then run some more test. So far, so good, I’ve inserted 21 millions of sample data.

Then I turned of the machine and unplug one of the SSD. After that I start it up again and when starting Aerospike, there’s the following error :

CRITICAL (storage): (storage.c:as_storage_init:80) could not initialize storage for namespace test

I thought that’s probably because Aerospike could not find /dev/sdc which I’ve unplugged. So I reverted back the configuration to only use /dev/sdb. Then I start aerospike, it gave no complains and runs well. However, (obviously?) some data went missing since the other SSD is unplugged. Then I tried to insert another 1 millions of sample data, which runs well.

After that, I turned off the machine again, replugged the 2nd SSD, and changed the configuration back to use both /dev/sdb and /dev/sdc. Tried to start Aerospike, but greeted with the following error message:

CRITICAL (drv_ssd): (drv_ssd.c:ssd_load_devices:3313) namespace test: drive set with unmatched headers - devices /dev/sdb & /dev/sdc have different signatures

So my question is :

  1. How could I solve this, possibly with all data intact?
  2. What’s the correct way to make a single node tolerant to drive failures?

Thank you very much for your attention.

With replication factor 2, the data is still in the cluster so you could zeroize the returning disk.

However, this is a valid sequence. Sounds like you are testing so interested in trying something that may potentially work? :smiling_imp:

CAUTION: Do not use the following untested procedure in production

You could make the drive headers match by…

Procedure removed because it could result in data loss

If you try this, would be great to hear the results.

Hi,

Thanks, it actually did worked. All the data seems to be back and okay! Here’s the startup log if you’re interested :

Nov 04 2015 10:03:00 GMT: INFO (as): (as.c::375) <><><><><><><><><><>  Aerospike Community Edition build 3.6.2  <><><><><><><><><><>
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) # Aerospike database configuration file.
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) service {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     user root
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     group root
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     pidfile /var/run/aerospike/asd.pid
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     service-threads 4
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     transaction-queues 4
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     transaction-threads-per-queue 4
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     proto-fd-max 15000
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) logging {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     # Log file must be an absolute path.
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     file /var/log/aerospike/aerospike.log {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         context any info
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) network {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     service {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         address any
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         port 3000
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     heartbeat {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         mode multicast
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         address 239.1.99.222
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         port 9918
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         # To use unicast-mesh heartbeats, remove the 3 lines above, and see
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         # aerospike_mesh.conf for alternative.
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         interval 150
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         timeout 10
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     fabric {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         port 3001
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     info {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         port 3003
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) namespace test {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     replication-factor 2
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     memory-size 6G
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     default-ttl 0
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) 
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     storage-engine device {
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         device /dev/sdb
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         device /dev/sdc
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)         write-block-size 128K
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157)     }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3157) }
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3175) system file descriptor limit: 100000, proto-fd-max: 15000
Nov 04 2015 10:03:00 GMT: INFO (cf:misc): (id.c::119) Node ip: 172.16.10.169
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3231) Rack Aware mode not enabled
Nov 04 2015 10:03:00 GMT: INFO (config): (cfg.c::3234) Node id bb9327a87fc8744
Nov 04 2015 10:03:00 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::3491) usable device size must be header size 1048576 + multiple of 1048576, rounding down
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::3602) opened device /dev/sdb: usable size 189644406784, io-min-size 512
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::3491) usable device size must be header size 1048576 + multiple of 1048576, rounding down
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::3602) opened device /dev/sdc: usable size 189644406784, io-min-size 512
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /dev/sdb has 1446872 wblocks of size 131072
Nov 04 2015 10:03:00 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /dev/sdc has 1446872 wblocks of size 131072
Nov 04 2015 10:03:01 GMT: INFO (drv_ssd): (drv_ssd.c::3136) device /dev/sdb: reading device to load index
Nov 04 2015 10:03:01 GMT: INFO (drv_ssd): (drv_ssd.c::3141) In TID 1689: Using arena #150 for loading data for namespace "test"
Nov 04 2015 10:03:01 GMT: INFO (drv_ssd): (drv_ssd.c::3136) device /dev/sdc: reading device to load index
Nov 04 2015 10:03:01 GMT: INFO (drv_ssd): (drv_ssd.c::3141) In TID 1690: Using arena #150 for loading data for namespace "test"
Nov 04 2015 10:03:03 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 878093 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:05 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 1709512 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:07 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 2525348 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:09 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 3328077 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:11 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 4132473 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:13 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 4926261 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:15 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 5708010 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:17 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 6486540 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:19 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 7270485 records, 0 subrecords, /dev/sdb 0%, /dev/sdc 0%
Nov 04 2015 10:03:21 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 8047954 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 1%
Nov 04 2015 10:03:23 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 8827293 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 1%
Nov 04 2015 10:03:25 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 9600204 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 1%
Nov 04 2015 10:03:27 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 10389235 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 1%
Nov 04 2015 10:03:28 GMT: INFO (drv_ssd): (drv_ssd.c::3162) device /dev/sdc: read complete: UNIQUE 5499196 (REPLACED 0) (GEN 0) (EXPIRED 0) (MAX-TTL 0) records
Nov 04 2015 10:03:29 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 11068140 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:31 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 11478895 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:33 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 11889379 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:35 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 12287496 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:37 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 12687998 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:39 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 13086535 records, 0 subrecords, /dev/sdb 1%, /dev/sdc 100%
Nov 04 2015 10:03:41 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 13487556 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:43 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 13892396 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:45 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 14294345 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:47 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 14696239 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:49 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 15099170 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:51 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 15502984 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:53 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 15904162 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:55 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 16305901 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:57 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 16705908 records, 0 subrecords, /dev/sdb 2%, /dev/sdc 100%
Nov 04 2015 10:03:59 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 17101937 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:01 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 17500001 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:03 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 17901570 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:05 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 18300942 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:07 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 18698657 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:09 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 19088943 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:11 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 19475518 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:13 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 19873566 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:15 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 20265697 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:17 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 20663287 records, 0 subrecords, /dev/sdb 3%, /dev/sdc 100%
Nov 04 2015 10:04:19 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 21065525 records, 0 subrecords, /dev/sdb 4%, /dev/sdc 100%
Nov 04 2015 10:04:21 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 21468245 records, 0 subrecords, /dev/sdb 4%, /dev/sdc 100%
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::3937) {test} loaded 21868081 records, 0 subrecords, /dev/sdb 4%, /dev/sdc 100%
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::3162) device /dev/sdb: read complete: UNIQUE 16500804 (REPLACED 0) (GEN 0) (EXPIRED 0) (MAX-TTL 0) records
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /dev/sdc init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /dev/sdb init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /dev/sdb init wblock free-q 1384186, defrag-q 1
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /dev/sdc init wblock free-q 1425976, defrag-q 1
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Nov 04 2015 10:04:23 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Nov 04 2015 10:04:23 GMT: INFO (as): (as.c::415) initializing services...
Nov 04 2015 10:04:23 GMT: INFO (tsvc): (thr_tsvc.c::910) shared queues: 4 queues with 4 threads each
Nov 04 2015 10:04:23 GMT: INFO (hb): (hb.c::2485) heartbeat socket initialization
Nov 04 2015 10:04:23 GMT: INFO (hb): (hb.c::2489) initializing multicast heartbeat socket : 239.1.99.222:9918
Nov 04 2015 10:04:23 GMT: INFO (paxos): (paxos.c::3121) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Nov 04 2015 10:04:23 GMT: INFO (partition): (partition.c::3941) {test} 4096 partitions: found 0 absent, 4096 stored
Nov 04 2015 10:04:23 GMT: INFO (paxos): (paxos.c::3125) Paxos service ignited: bb9327a87fc8744
Nov 04 2015 10:04:24 GMT: INFO (batch): (batch.c::562) Initialize batch-index-threads to 4
Nov 04 2015 10:04:24 GMT: INFO (batch): (thr_batch.c::346) Initialize batch-threads to 4
Nov 04 2015 10:04:24 GMT: INFO (drv_ssd): (drv_ssd.c::4107) {test} floor set at 41 wblocks per device
Nov 04 2015 10:04:26 GMT: INFO (paxos): (paxos.c::3206) listening for other nodes (max 3000 milliseconds) ...
Nov 04 2015 10:04:29 GMT: INFO (paxos): (paxos.c::3223) ... no other nodes detected - node will operate as a single-node cluster
Nov 04 2015 10:04:29 GMT: INFO (partition): (partition.c::3996) {test} 0 absent partitions promoted to master
Nov 04 2015 10:04:29 GMT: INFO (paxos): (paxos.c::3183) paxos supervisor thread started
Nov 04 2015 10:04:29 GMT: INFO (nsup): (thr_nsup.c::1144) namespace supervisor started
Nov 04 2015 10:04:29 GMT: INFO (ldt): (thr_nsup.c::1107) LDT supervisor started
Nov 04 2015 10:04:29 GMT: INFO (demarshal): (thr_demarshal.c::260) Saved original JEMalloc arena #7 for thr_demarshal()
Nov 04 2015 10:04:29 GMT: INFO (demarshal): (thr_demarshal.c::288) Service started: socket 3000
Nov 04 2015 10:04:30 GMT: INFO (demarshal): (thr_demarshal.c::260) Saved original JEMalloc arena #8 for thr_demarshal()
Nov 04 2015 10:04:30 GMT: INFO (demarshal): (thr_demarshal.c::260) Saved original JEMalloc arena #9 for thr_demarshal()
Nov 04 2015 10:04:30 GMT: INFO (demarshal): (thr_demarshal.c::260) Saved original JEMalloc arena #10 for thr_demarshal()
Nov 04 2015 10:04:31 GMT: INFO (demarshal): (thr_demarshal.c::812) Waiting to spawn demarshal threads ...
Nov 04 2015 10:04:31 GMT: INFO (demarshal): (thr_demarshal.c::815) Started 4 Demarshal Threads
Nov 04 2015 10:04:31 GMT: INFO (as): (as.c::452) service ready: soon there will be cake!

Now then, I will do some more testing and hopefully it will be a smooth sailing.

Again, thank you very much!

1 Like

Alight so you should never do this again :). So after discussing this issue there are good reasons that drive is being rejected and tricking the header as we have done could result in data loss. The drive that was never removed will have changed partition versions and will have also have some of the same partitions that are on the other drive. This is condition that is impossible without tricking the headers so such conditions are not handled.

In the future if you run into this situation, you will need to zeroize the returning drive before adding it back to the node.

I see.

However, in the case of running only a single machine, is there any possibility to recover data from drive failure? Or should I rely on RAID and backups?

Thanks.

We are typically clustered so this isn’t a normal pattern for us.

You could create a backup of the contents stored on the removed disk by:

  1. Backup your aerospike.conf.
  2. In your aerospike.conf, replace the existing disk with the removed disk.
  3. Start Aerospike.
  4. Perform a backup of Aerospike.
  5. Once the backup finishes stop Aerospike.
  6. Restore your aerospike.conf and add the removed disk back into your list of devices.
  7. Now zeroize the removed disk.
  8. Start Aerospike.
  9. Now restore your backup.