A fair bit of paranoia went into this. There are potential edge cases that won’t be caught from simply checking if tx_remaining==0/&& rx_remaining==0. For example during split-brain scenarios or failure to actually cluster/reform a cluster, to name a couple, would result in migrate_remaining==0. I think the official answer from Aerospike has always been to use ‘cluster-stable’ info command but there are plenty of other checks you could add on top of this depending on your appetite for issues/issue avoidance (pre check for wiggle room in capacity, check for recent stability and changes, etc).
I don’t think you’re entirely off base, but that’s a judgement call you’ll have to make on your own on how many checks are sufficient. If you want to scrap the whole module and do a rewrite together, I’d be happy to add you as a contributor and work with you.
On the note of the migrate_partitions_remaining super metric, I don’t think that existed when I wrote this. It was only namespace-level migrate metrics.