Could it have been possible the cluster was performing migrations when the reads were not found? Prior to 3.6.x the batch system would fail with notfound during migrations if the record no longer resided on the target node.
I am using the async oparations in the following matter:
public void store() {
WritePolicy expirationWritePolicy = new WritePolicy();
expirationWritePolicy.sendKey = true;
expirationWritePolicy.priority = Priority.HIGH;
expirationWritePolicy.expiration = 10;
Key key = new Key(namespace, SET_NAME, requestId);
Bin bin = new Bin(BIN_NAME, serializer.toBinary(budgetCommit));
Bin extra = new Bin("extra", "data");
client.put(expirationWritePolicy, new WriteListener() {
@Override
public void onSuccess(Key key) {
logger.info("Succeed to store {}", requestId());
}
@Override
public void onFailure(AerospikeException exception) {
logger.error(exception, "Fail to store {}", key);
}
}, key, extra, bin);
}
public void retrieve() {
WritePolicy defaultWritePolicy = new WritePolicy();
defaultWritePolicy.priority = Priority.LOW;
defaultWritePolicy.sendKey = true;
Key key = new Key(namespace, SET_NAME, requestId);
Bin closeExtra = new Bin("extra", "_closed");
client.operate(defaultWritePolicy, new RecordListener() {
@Override
public void onSuccess(Key key, Record record) {
if (record.getValue(BIN_NAME) == null){
logger.error("Fail to retrieve {}", requestId);
}
}
@Override
public void onFailure(AerospikeException exception) {
logger.error("Fail to retrieve {} : {}", requestId, exception.getMessage());
}
}, key,
Operation.append(closeExtra), Operation.get());
}
[INFO] [12/01/2016 08:37:16.732] Succeed to store 379e67dc-945d-4717-97a7-721cc8093c05
[ERROR] [12/01/2016 08:37:16.736] Fail to retrieve 379e67dc-945d-4717-97a7-721cc8093c05
The onSuccess callback is called when there is an Ack from the Aerospike.
Ah, my suspicion is that you have breached an eviction high water mark, either memory or disk. We can confirm this by running asadm -e "info namespace" and checking if HWM Mem% or HWM Disk% is above their respective Used% values.
I suspect you may have at least two TTL values that differ significantly. This use pattern will cause the lower ttl to be purged when eviction kick in. For this use case we have set-disable-eviction which will exclude a particular set. The configuration reference page shows how to dynamically set this option for a set, and a static configuration example can be found in the set-data-retention confiuguration documentation.