Rolling restarts without Quiesce and "fail_key_busy" - Corelation or causation?

Jeff.MacDonald · November 16, 2022, 6:31pm

In a cluster of about 20 machines, we were having a fairly consistent rate of “fail_key_busy” errors.

We needed to a rolling restart, and at the time our Ansible code did not quiesce each node before restarting it.

After the most recently rolling restart the number of fail_key_busy alerts per day began to increase steadily.

Is there any causation there?

Albot · November 18, 2022, 1:31am

Well fail_key_busy is pushing back because of contention. If you have nodes out or have ongoing migration, the cluster is under increased stress. It does make some sense that there is an increase, as transactions will take longer. You can read more about tuning here Hot Key error code 14

system · November 18, 2023, 1:32am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hot key errors even with 'memory' namespace Tuning	4	2385	June 6, 2022
Cluster not syncing back: try rolling restart or fast restart (AER-4500)	10	2851	November 21, 2015
Speed up re-joining a cluster Operations	7	820	January 31, 2020
Cluster (Error: (1) unstable-cluster)	21	3598	May 23, 2019
Error when one of the nodes try to load	6	852	May 11, 2022

Rolling restarts without Quiesce and "fail_key_busy" - Corelation or causation?

Related topics