Solution - Migration stalls with `record too small`

Solution: Migration stalls with record too small

Problem Description

When a cluster is migrating the migration does not complete and nodes with incoming migrations report the following error in the logs.

Mar 05 2020 01:41:07 GMT: WARNING (flat): (flat.c:135) record too small 0
Mar 05 2020 01:41:07 GMT: WARNING (migrate): (migrate.c:1398) handle insert: got bad record

Explanation

This error will occur when there is a node in the cluster with a bad disk. The node is aware that it needs to send out a record but due to the disk error the record is of 0 size. The error will occur on the node where the migration is inbound as it cannot write an inbound record of 0 size. On checking the status of migrations it is likely that a single node will be the source of the problematic partitions.

Solution

As the issue is due to a problem with node hardware the quickest solution to allow migrations to complete would be to shutdown the problem source node. The problematic node will almost certainly be showing disk errors in dmesg which can be run manually or as part of the asadm -e collectinfo command. The dmesg output would look similar to the output below:

[11055874.801271] Buffer I/O error on dev nvme0n2, logical block 38509628, async page read
[11055888.682385] print_req_error: critical medium error, dev nvme0n2, sector 308077024

Shutting down the problem node would cause extra migration but should not have any other negative effect.

Keywords

MIGRATION STALLED DISK ERROR NODE

Timestamp

March 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.