Aerospike's Disk not getting mount after machine restart

Hi

Came across this scary problem. Our box running Aerospike server rebooted. And disk which was allocated to AS couldn’t get mounted by itself. Appropriate entries of this disk were in fstab. Reason being linux couldn’t identify the file system.

We were banking on disk storage to for recovery purpose. But with this problem, that’s looking impossible now.

Can someone pls suggest a solution? Is there any known filesystem name we can provide to the mount command ?

Or should we rather switch to using filesystem rather than using raw drive ? Will there be any perf hits with file system compared to raw device ? KIndly help here. Thanks.

I’m a bit confused. Are you writing to a raw device which is mounted? If so Aerospike will happily overwrite the filesystem for you :smile:. Be extra careful not to supply your root device, have personally made that mistake :grimacing:.

There is a significant performance hit for going through the filesystem, there are ways to reduce the hit by mounting with option such as noatime but you will not regain all of the lost performance.

I guess you should be able to continue using the disk? If not and you have multiple disks [used by Aerospike] on this node, you can zeroize this disk and then start the server, Aerospike will treat it as a disk replacement and migrations will repopulate the disk. If you have only one disk, this is still true but less interesting :smiley:.

EDIT: Added “used by Aerospike”

Hello @kporter

Thanks for your reply. Please see my inline comments.

I’m a bit confused. Are you writing to a raw device which is mounted? If so Aerospike will happily overwrite the filesystem for you smile. Be extra careful not to supply your root device, have personally made that mistake grimacing.

Yes, we are writing on a mounted raw device, specially mounted for AS use. And yes, AS is happily overwriting the filesystem. And I should confess here that once in past I also ended up allocating the disk with OS installed to AS and it was rightfully wiped off(This is not the reason of my current problem though)

I guess you should be able to continue using the disk? If not and you have multiple disks on this node, you can zeroize this disk and then start the server, Aerospike will treat it as a disk replacement and migrations will repopulate the disk. If you have only one disk, this is still true but less interesting

We have multiple disks on system. One of them is allocated to AS. Other disks are in ext4 while AS has created its own file system. Everything was running fine till here. Now there was some hardware issue due to which the machine rebooted . All the other disks gets mounted by themselves except disk that was getting used by AS

Linux doesn’t doesn’t identify AS’s format and doesn’t mount that disk. It gives error: "mount: you must specify the filesystem type"

One has to wipe clean that disk and then mount it manually in a different step. And this is my problem actually. I want the AS disk to get mount itself (like others) and then do a cold restart when the machine reboots.

Can you please suggest a solution for it without actually using the file-system in place of raw device? Sorry if I am still not clear. Let me know what specific I can explain more. Thanks.

When using raw devices, those devices shouldn’t be mounted.

Always used to think that disks should be mounted to be used. :expressionless:

Sorry to bother you for my mistake. I will use it without mounting. Thanks again!

@ashishbhutani @kporter: Sorry to bother but how do you use a disk without mounting? I have a Azure machine which mounts the disk on sda1 but restarting the machine not seeing the disk in "df "

df shows mounted disks. use lsblk instead to show devices instead of mount points @Gunjan_Sharma

so it all goes fine, and I reference the raw disks with /dev/nvme2n1p1 … etc after I partitioned the disk with sgdisk

but when I reboot the machine, aws attaches the volumes in a different order and my root os is not /dev/nvme2n1 and the data disks are /dev/nve1m1…

how can I ensure that the OS does not change the order of the volumes without referencing them by UUID in /etc/fstab ?

That’s a common problem. We solved it by using Chef, which we have setup to detect which drives are mounted/used for OS and which are used for Aerospike directly - then we sort by serial and ‘hand out’ the drives to write the config file and other scripts.

This might be worth looking at, but I’m not sure if it works in AWS. How to configure storage device to use the disk WWID you should probably test first by taking note of the wwid of the disks before a full stop and see if they still appear as same wwid after start… Another thing I’m unsure of in this situation is if the WWID will change if the instance migrates to a new physical host - perhaps AWS support might help? Anyway, to reiterate, the way we solve it is by just excluding anything that is mounted while chef runs before starting up Aerospike. After a stop/start the ephemeral disks are empty anyway, so its fine if it chooses a new order and we don’t care about namespace mixing since they’re all empty.

If you’re using EBS volumes, oof, you probably will only need to do the WWID thing once per EBS vol unless it has to be recreated for some reason.

1 Like

Can I reference the disk partition by UUID directly in the config (so not using the WWID)

I don’t think so. Can you dd from that as a device? Ex… try ls -lad <my/test/path> && dd if=<my/test/path> bs=1 count=512|hexdump -C do you see that it is a block device and that you can read data off of it? I think UUID is generated after a filesystem is placed, so I don’t think this will work with raw block devices.

I have found an alternative workaround. Basically setting a partedlabel in the gpt and using /dev/disk/by-partlabel/

indeed the UUID is only created for a filesystem. but on OS boot (with ubuntu 18.04 at least) it probes the gpt device tables for labels and makes them available. It survives reboot.

So my current configuration is

        storage-engine device {
                device /dev/disk/by-partlabel/nvme-ssd-1-1 /dev/disk/by-partlabel/shadow-disk-1-1
                device /dev/disk/by-partlabel/nvme-ssd-1-2 /dev/disk/by-partlabel/shadow-disk-2-1

and cluster is healthy.

the labels are set with sgdisk -p /dev/nvme2n1 -c 2:shadow-disk-1-2

and the nvme device is found based on the volume_id where the volume_id is found based on a volume label

1 Like

Nice. Thanks @Sentient for sharing, good to know.