What happens if the device order in a namespace is accidentally changed


#1

What happens if the device order in a namespace is accidentally changed

Update

In version 4.2.0.2 we have made device configuration order independent across fast restart as well. This is therefore not an issue from version 4.2.0.2 onwards.

Background

If you are using storage-engine device for a namespace, your configuration might look something like this:

namespace test {
	replication-factor 2
	memory-size 512G
		storage-engine device {
			device /dev/sdb
			device /dev/sdc
			device /dev/sdd
			write-block-size 1M
		}
}

Now, if during reconfiguration, you change the order of the devices (e.g. the sdc comes before sdb now) or you accidentally put the disks into the hardware in the wrong order following maintenance, Aerospike will not work.

Issue

When you configure devices, in the given order, these are treated as a continous space sequence for aerospike. If you swap sdc in the above example with sdb, the middle of that space would become the front, therefore essentially ending up as a corrupted device. The equivalent file-based storage would be a single database file, where you rewrite the file such as the middle third of the file becomes the beginning of the file. This will most definitely not work, and you will end up with asd complaining about record errors and possibly ending up with data corruption.

The ‘asd’ process may show errors similar to these, suggesting a corrupt disk data block:

Jun 25 2018 14:20:01 GMT: WARNING (drv_ssd): (drv_ssd.c:1241) load_n_bins: failed ssd_read_record()
Jun 25 2018 14:20:01 GMT: WARNING (drv_ssd): (drv_ssd.c:1204) read: bad block magic offset 89810986496
Jun 25 2018 14:20:01 GMT: WARNING (drv_ssd): (drv_ssd.c:1259) load_bins: failed ssd_read_record()
Jun 25 2018 14:20:01 GMT: WARNING (rw): (read.c:349) {test} read_local: found record with no bins <Digest>:...

Fix

In cases such as this one, assuming this was an issue on only 1 node (or exactly replication-factor - 1 nodes), it is best to cold start the node(s) empty. As such:

  1. stop aerospike
  2. erase data on all aerospike data disks (if using files as storage delete the files): Zeroize multiple ssds simultaneously
  3. fix your config to how you want it to be
  4. start aerospike
  5. wait for migrations to finish

This will cold start aerospike on the node, without any data, and migrations will handle the rest. Since we are starting empty (like when a new node is added), migrations might take a while.

If you ended up reversing the order of replication-factor number of nodes or more, the issue becomes more complicated, assuming you cannot afford to lose data. In this case, you could try to stop the affected nodes, reverse the order back and start them. If no new writes happened to further corrupt data, the nodes should start as expected. Unfortunately, if you are unable to start enough nodes (replication-factor - 1, at least), you would have data loss. In such a case, if you have XDR configured, you can restore from the XDR destination, as explained here: How to migrate aerospike from one cluster to another or restore from a backup.

Keywords

XDR bad block magic offset changed device order

Timestamp

6/26/2018