Do SSDs have to be dd-ed before install?

Hi,

What’s the final word on whether SSDs need to be dd-ed before deploying aerospike? I find the information in the docs a bit confusing. The SSD Initialization page says:

"The drives must be erased or zeroed out before use" 

and then the following only mentions adding disks, not creating nodes with an initial disk:

"Starting from version 3.3.5, you no longer need to dd the existing remaining disks when adding a new disk or replacing an existing disk with a new disk in an existing namespace."

Also the AWS recommendations page says the following where pre-warming is done with dd as well:

"The I2, R3, and HI1 instance types use direct-attached solid state drives for instance (ephemeral) storage that provide maximum performance at launch time, without pre-warming."

So what does this mean - when creating a new node on EC2 i2 instance type, with aerospike 3.5 or 3.6 do I have to dd the disks first or not? If I there is no need to dd the disks how does aerospike know which parts of the disk have valid data and which ones not?

Thanks!

Hy,

On GCE, with local SSDs, you don’t have to dd them.

Just modify your aerospike.conf file and launch the service.

You’ll only need to dd if you want to delete a large amount of test data, for exemple.

Regards.

Emmanuel

dding the drives in essence deletes the data. I believe this was done when the drives are set up in the instance.

However, should you ever want to delete the data on the drives, you should dd the entire drive. This will make certain that you never accidentally retrieve garbage data on the SSD. This may take many minutes to hours and can be done in parallel with each other.

There is another option. If you are only testing the instance, you can simply get rid of the headers that Aerospike sets up. I generally use the following:

dd if=/dev/zero of=/dev/[device] bs=1M count=1000

This should only take a few seconds. This is a bit of overkill since the headers are not that big. However, only use this if you do not intend to stop and restart the node or you may get corrupt data. I use this when doing testing, but I re-dd the drives BEFORE starting Aerospike every time.

The reason to add on AWS is for performance. AWS does not allocate all of the space you request until you use that space; by add’ing the disks, AWS will believe that the entire space is in use. CAUTION! This will wipe all data from the disks, so do not do this on disks that have production data already. You should also avoid mistakenly zeroizing your root partition .

Does this apply to i3 AWS instances that have instance store? Wouldn’t the whole disk be available in this case?

For many instance types, AWS no longer recommends initializing the disks:

So the suggestion made 3 years ago may be stale.