How To recover data from faulty disks when a multi-node disk failure occurs

How To recover data from faulty disks when a multi-node disk failure occurs

Context

Should disks fail on multiple nodes over time unnoticed, data loss may occur if migrations do not complete between failures. For this to occur, disks must fail on more nodes than the configured data replication-factor. If data must be recovered, a number of approaches exist, as outlined below:

  1. failover to a secondary cluster, which is kept up to date by XDR
  2. recovery of data using an existing point-in-time backup into a working cluster (or existing cluster once the disks have been replaced on the faulty nodes)
  3. sector-by-sector disk data recovery

In almost all cases, if possible, it is best to failover to a secondary cluster, as the data will be most up to date. If a secondary cluster is not present, data can be recovered using a point-in-time backup or, failing that, a sector-by-sector recovery.

This article describes how to recover data from a faulty disk by performing sector-by-sector copy.

Please note that if sector-by-sector data recovery is required, it should be performed as soon as possible, as with every passing moment more bad-sectors may occur, even if the disk is not experiencing any writes. Behaviour of a faulty disk, or software which uses a faulty disk, is unpredictable.

Method

If a spare disk can be inserted into the node at the same time a faulty disk is attached, the recovery can be done directly from one disk to another. Otherwise, a backup disk image file can be created first and recovered from afterwards. The below steps demonstrate both methods.

Copy data from one disk to another, ignoring errors

This can be done if the faulty disk and a new, empty disk are present on the machine at the same time.

dd if=/dev/nvme0n1 of=/dev/nvme1n1 bs=4096 conv=sync,noerror
sync

Copy data from one disk to another using an intermediate file, ignoring errors

This can be done if a backup must be taken first, disk replaced and backup recovered.

Copy disk to file:

dd if=/dev/nvme0n1 of=/some/backup/file.img bs=4096 conv=sync,noerror
sync

Recover from file to disk:

dd if=/some/backup/file.img of=/dev/nvme0n1 bs=4096 conv=sync,noerror
sync

Copy data from one disk to another using an intermediate file, ignoring errors, using gzip

This can be done if a backup must be taken first, disk replaced and backup recovered. It potentially saves on disk space required to hold the backup file and may be faster due to decreased I/O.

Copy disk to file:

dd if=/dev/nvme0n1 bs=4096 conv=sync,noerror |gzip > /some/backup/file.img.gz
sync

Recover from file to disk:

zcat /some/backup/file.img.gz |dd of=/dev/nvme0n1 bs=4096 conv=sync,noerror
sync

Notes

For a recovery of data from a file-backed storage, it is adviseable to clone the whole disk before exporting the data file, in order to remove the filesystem behaviour out of the equation.

Data recovery takes time, even for fast disks with low bad sector count. Be prepared for a long wait period.

Some data on the disk WILL be corrupt. The sector-by-sector data copy recovers data from all sectors on the disk that could be read. Data which could not be read, at max 4k intervals, will be padded with zeroes. This process recovers all data which can still be read onto a good disk. It is impossible to recover corrupt data points from a faulty disk, as that data has already been lost.

A 4096-byte r/w chunk is chosen in dd due to the default sector size on most current disks. Choosing a smaller size would result in a much slower recovery pace while it would provide absolutely no benefit - data is still read in 4k chunks from the disk itself. Providing a larger chunk could speed up the data copy process but may result in larger data recovery holes for each bad sector found.

Keywords

DATA RECOVERY READ ERROR DD DISK FAULT

Timestamp

September 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.