We currently have 3-replica C-4.5.0.10 cluster. We would like to upgrade to some latest version (Community) without full downtime. Rolling upgrade means that the cluster will run mixed versions for about two days due to rebalancing. We use record TTL (which implies expirations, if I understand correctly), deletions and UDFs (existing - not planning to modify UDFs during upgrading - so I guess it is not a problem?). Storage devices will be rolling-wiped to minimize risks.
After reading https://www.aerospike.com/docs/operations/upgrade/aerospike/special_upgrades/4.5.1/index.html do I understand correctly that:
- device wiping does not help because the problem is in internal server protocol incompatibility?
- does TTL gets replicated between mixed versions? (for example, ‘touch’)
- records with master 4.5.1+ will not be expired on <4.5.1 nodes?
3.1 records with master <4.5.1 will send unneeded expirations to 4.5.1+ replicas but it is not a problem?
3.2 what are the bad consequnces? Let’s imagine there is enough reserved storage space. Then let’s imagine something goes wrong and, for example, the master changes back again. Expired records won’t be returned to a client (and… re-replicated?), will they?
3.3 after rolling upgrade is completed, if there are these not-expired records left, will they be expired by the new version mechanism? For example, imagine 2 of 3 (or 1 of 3) machines run the new version and in this time some records on the 3rd one doesnt get expired - then I upgrade this machine - will it expire these records?
While I’m writing it, I recalled that records doesn’t get really deleted in-place until nsup reviews them, and I got even more confused in the context of rebalances… what then the act of expiration means? Also I fail to understand prole-extra-ttl and what value should I set it to. If nsup sees the expired records already, why doesn’t it delete them and why it is not a flag but a ttl - why does it matter if it is set to 2 seconds or to 2 minutes or to whatever? Does it exists only in 4.5.0.10 and is already deleted in 4.5.0.11 - does it mean I must remain on 4.5.0.10 to use it or it is somehow always-auto-enabled starting from 4.5.0.11?
And so, to sum it up - is enabling prole-extra-ttl for rolling restarts fully mitigate the problem with expirations and deletions during long rebalances?