Upgrade from 4.8.0.3 to 4.9.0.4, high rate of write timeout

1114 · April 20, 2020, 11:48am

For a rolling upgrade, I added a 4.9.0.4 (ip-10-1-14-204) node into the 4.8.0.3 cluster. Even after the partition migration done, the 4.9 node still had a very high rate of write timeout. Finally I removed it from the cluster and the performance of cluster recovered as usual.

All the nodes use the same configure.

I found logs like this on 4.8 nodes

Apr 20 2020 09:01:18 GMT: WARNING (rw): (replica_write.c:255) repl_write_handle_op: bad record
Apr 20 2020 09:01:18 GMT: WARNING (flat): (flat.c:183) unsupported storage fields
Apr 20 2020 09:01:18 GMT: WARNING (rw): (replica_write.c:255) repl_write_handle_op: bad record
Apr 20 2020 09:01:18 GMT: WARNING (flat): (flat.c:183) unsupported storage fields

this on 4.9 node

Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest 
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest
Apr 20 2020 09:04:36 GMT: WARNING (rw): (replica_write.c:418) repl-write ack: no digest

I didn’t find any attention in the doc about upgrading from 4.8 to 4.9, but it seems not safe to do the rolling upgrade as usual. I succeeded with 4.6 to 4.7, 4.7 to 4.8 without any issue.

Any help?

kporter · April 20, 2020, 6:03pm

Could you share your aerospike.conf as well as the output of asadm -e "info".

Are you running Aerospike Community or Enterprise?

kporter · April 20, 2020, 7:34pm

NVM we have confirmed that this is a bug in Aerospike Community Edition. We are working on a hot-fix.

BTW, this bug could have corrupted the data on that node, you should probably wipe that nodes disks and have it rejoin the 4.8 cluster. Migrations will repopulate the disks.

Thank you for reporting this issue.

1114 · April 21, 2020, 1:24am

Yes, we are running Aerospike CE. Good to know you located the issue. Thank you for you remind, the 4.9 node has been terminated.

system · April 27, 2020, 1:24am

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Migration make the aerospike can't writing Migration	1	1479	November 1, 2016
Aerospike write error 9 Tuning	3	1841	April 20, 2017
Error Code 9: Timeout after update 3.7.3 to 3.8.1	7	2606	May 20, 2016
Bad performance after upgrade due to migrations Upgrading	9	3288	July 8, 2015
Upgrade from 4.3.0.6 to 4.4.0.6, data rollback Upgrading	11	1922	January 11, 2019

Upgrade from 4.8.0.3 to 4.9.0.4, high rate of write timeout

Related topics