Hi, I’ve a cluster constructed by 8 nodes, recently some problems occured when dealing with high write pressure. The disk-avail-pct dropped extermly low and finally reached stop-write. Here’s the info of my cluster:
The problems occured on the namespace user_durable_list. This is a namespace that basically store only list data. From beginning, the namespace is restored from a backup, and the disk-usage-pct of the namespace is about 14, the disk-avail-pct is about 85. Then it begins to process write requests. The write pressure is high, which the monitor shows that the disk IO reaches over 90%. And the disk-avail-pct is continuously dropping, you could see the values in the screenshoot, which dropped from 80+ to 10+ for certain nodes. And they will finally drop to about 5 or 4, then stop write. I’ve read serval articles about the disk-usage-pct in the forum, but still can’t figure out why the disk-avail-pct dropped so low, please help.
BTW, I’m considering upgrade the version of the cluster(3.16) to the newest stable version 4.x, are there any special operations I should know?