Back pressure mechanism in case of DEVICE_OVERLOAD error

szhem · December 20, 2019, 8:39am

Hello!

Here is my alternative version of this topic created independently).

I have though about back pressure too, but finally decided that it is not as simple to implement as it seems to be, because

info protocol as @Albot suggested will return write-q value on the per-device basis
single server may contain multiple devices
there are multiple servers in the cluster

So imagine that there is just a single slow disk in the cluster of multiple machines and its write-q increases. How to understand whether to stop writes or to continue if you don’t exactly know which device will be reached by exactly this record?

Moreover in case of info protocol there is no guarantee that you don’t get device overload error between two info requests.

What I’d like to understand in my question here is whether there are any possibilities of loses in case when client writes a lot of data with non-blocking API, fills up write cache (max-write-cache config option), gets an error, then sleeps for some period of time and then retries all the requests which led to device overload errors previously?

Topic		Replies	Views
Handling "write fail: queue too deep / Error Code 18: Device overload" on the client side Client Libraries	2	2146	January 4, 2020
How to troubleshoot/fix "Device overload"?	14	13693	February 1, 2022
Device overload when map size is too big Tuning map	4	3127	May 28, 2018
Acessing `write-q` using python-client Operations query	5	778	August 5, 2022
Write-q stuck at max	10	1426	June 6, 2023

Back pressure mechanism in case of DEVICE_OVERLOAD error

Related topics