Customer is finding “INFO: task NAME:PID blocked for more than 120 seconds” in
dmesg. This indicates that the process requested a
blocking resource such as a disk and wasn’t able to be fulfilled for more than 120
seconds and subsequently abandoned by the kernel. There are two known situations
where this has been observed with Aerospike.
- Aerospike segfaulted resulting in a very large coredump being flushed to disk.
- More than
vm.dirty_ratiodirty disk pages exist in RAM and are being flushed to disk.
For Aerospike Segfault
/var/log/aerospike/aerospike.log for log lines originating from
signal.c. Or in the event that a stack trace is not printed look for evidence
that the server shutdown such as config being printed from the
context without message indicating shutdown from
signal.c or the
For Flushing of Dirty Disk Pages
For this situation the process should be “readahead.cron”.
INFO: task readahead.cron:12101 blocked for more than 120 seconds.
By default Linux will use up to
vm.dirty_ratio of available memory for dirty
file cache pages. These are pages that for performance purposes, are not
immediately flushed to disk. In CentOS 6, the default is 20. After this mark has
been reached the file system flushes all outstanding writes to disk causing all
following IOs going synchronous. For modern personal workstations with more than
4GB RAM this is a problem… For production servers with 128GB or more RAM this
is a horrible problem! The subsequent writes will cause a huge IO spike driving
up write latencies. In the case here the IO subsystem is not fast enough to
flush the data withing 120 seconds. This especially happens on systems with a
lot of memory.
This has been fixed in recent kernels but again older kernels have this issue. We need to check into this.
vm.dirty_ratio is 20 for our test nodes.
Instead of using
vm.dirty_ratio production boxes should use
vm.dirty_bytes = 209715200 and
vm.dirty_background_bytes = 104857600 which
is 200MiB and 100MiB respectively. Setting
vm.dirty_background_bytes automatically sets the ratio counterparts to 0
To configure these changes open
/etc/sysctl.conf and comment any reference to
vm.dirty_background_ratio and then add the recommended values above to the end
of the file.
... vm.dirty_bytes = 209715200 vm.dirty_background_bytes = 104857600
After saving the change run
sudo sysctl -p