We have 6 nodes cluster with replication factor 3 (aerospike CE 3.6.0). Namespace is configured to store data in memory with data persistence file. Data is updated once a day at midnight. During a simulation of massive hw failure we kill 3 nodes (aprox 10 hours after last update). When we (re)start nodes, old data for some keys appears. This affects aprox. 3% of keys and persist even when all migrations are finished. We’re using python client and only simple puts/gets.
We are working on test with more details (logging generation number, store timestamp of update). Could you get me some hints how to track down this issue?
The namespace has on server side default-ttl 0. In application we don’t use another settings for TTL. We know behaviour of deletes, but we don’t perform any delete operation on keys (at least few months).
We will try explicit setting fsync-max-sec - thanks for hint.
We aren’t able to replicate this issue now. Similar behaviour can be seen, when there isn’t enough space and some records has been evicted.
Feel free to delete this topics - I don’t want scare other users Anyway thanks for your hints.