For a rolling upgrade, I added a 4.9.0.4 (ip-10-1-14-204) node into the 4.8.0.3 cluster. Even after the partition migration done, the 4.9 node still had a very high rate of write timeout. Finally I removed it from the cluster and the performance of cluster recovered as usual.
All the nodes use the same configure.
I found logs like this on 4.8 nodes
Apr 20 2020 09:01:18 GMT: WARNING (rw): (replica_write.c:255) repl_write_handle_op: bad record
Apr 20 2020 09:01:18 GMT: WARNING (flat): (flat.c:183) unsupported storage fields
Apr 20 2020 09:01:18 GMT: WARNING (rw): (replica_write.c:255) repl_write_handle_op: bad record
Apr 20 2020 09:01:18 GMT: WARNING (flat): (flat.c:183) unsupported storage fields
this on 4.9 node
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
I didn’t find any attention in the doc about upgrading from 4.8 to 4.9, but it seems not safe to do the rolling upgrade as usual. I succeeded with 4.6 to 4.7, 4.7 to 4.8 without any issue.
Any help?