Hi guys! We are using Aerospike in our company for quite a while now. Currently it is a cluster of 11 nodes and more than 4 000 000 000 records.
Yet we’ve encountered a strange problem: when one node fails (ssd breaks, or hangs up) we loose a portion of records. Last time we’ve lost approximately 20% of them. We evaluate the loss amount using the daily backups by comparing with backups that are made after migration is completed. We are using Aerospike 3.3.21 (replication-factor=2) So, what’s wrong with it?