Hy, Till we’re using a GCE cluster with local-ssd disks we’ve facing instability problems.
We have now some responses.
First, deactivate the swap as it is placed on the local-SSD disk. When using Aerospike with indexes only in ram with SSD configuration, the entire disk is used and drop the swap reserved space, but the system can try to use it and will fail.
The other big problem is instances failing without particular reasons, cpu, disk space, used ram, all is ok, but server become unreachable and must be restarted. If you restart a local-SSD instance you’ll loose the disk too.
We just had an information from Google, that can help to understand this problem :
"Dear Google Cloud Platform customer,
We have detected that your Google Cloud Developer project currently has a Compute Engine instance using a Local SSD device via the NVMe interface:
Google has identified an edge case performance issue with the NVMe interface that might affect any Local SSD workload that uses the NVME interface. We strongly advise you to use only the SCSI interface with Local SSD for the time being.
Google is actively working on this issue and will notify you once it has been mitigated.
PLEASE NOTE: Responses to this email message will not be monitored. If you have any questions or concerns and have a Silver, Gold or Platinum Cloud Platform support package, please open a support case via the Support Center: https://enterprise.google.com/supportcenter
Bronze customers can contact us via: https://support.google.com/cloud/contact/local_ssd_on_nvme
Best regards, Google Cloud Platform Support "
Best regards.
Emmanuel VINET