Solution: Migration stalls with
record too small
When a cluster is migrating the migration does not complete and nodes with incoming migrations report the following error in the logs.
Mar 05 2020 01:41:07 GMT: WARNING (flat): (flat.c:135) record too small 0 Mar 05 2020 01:41:07 GMT: WARNING (migrate): (migrate.c:1398) handle insert: got bad record
This error will occur when there is a node in the cluster with a bad disk. The node is aware that it needs to send out a record but due to the disk error the record is of 0 size. The error will occur on the node where the migration is inbound as it cannot write an inbound record of 0 size. On checking the status of migrations it is likely that a single node will be the source of the problematic partitions.
As the issue is due to a problem with node hardware the quickest solution to allow migrations to complete would be to shutdown the problem source node. The problematic node will almost certainly be showing disk errors in
dmesg which can be run manually or as part of the
asadm -e collectinfo command. The dmesg output would look similar to the output below:
[11055874.801271] Buffer I/O error on dev nvme0n2, logical block 38509628, async page read [11055888.682385] print_req_error: critical medium error, dev nvme0n2, sector 308077024
Shutting down the problem node would cause extra migration but should not have any other negative effect.
MIGRATION STALLED DISK ERROR NODE