Summary
We are observing KEY_NOT_FOUND (ResultCode 2) errors on AerospikeClient.put() calls when using RecordExistsAction.UPDATE combined with GenerationPolicy.EXPECT_GEN_EQUAL. Per the Aerospike documentation, RecordExistsAction.UPDATE should “Create or update record. Merge write command bins with existing bins.”
The error occurs when a record ceases to exist between our application’s read and subsequent write — either because a concurrent cleanup process deleted it, or because the record’s TTL expired. Both the initial write attempt and our application-level retry fail with the same error, resulting in the incoming data being silently dropped.
We are seeking clarification on whether this is expected behavior and guidance on the recommended approach.
Architecture & Scenario
We have two independent processes operating on the same Aerospike namespace/set:
1. Writer Process (UPC) — Performs a read-modify-write cycle on user profile records using selective bin-level updates:
- Reads the record (captures record.generation for optimistic concurrency)
- Modifies specific bins (adds/updates audience data)
- Writes back using RecordExistsAction.UPDATE + GenerationPolicy.EXPECT_GEN_EQUAL with the generation from the read
2. Cleanup Process — Periodically scans for stale or empty records and deletes them using GenerationPolicy.EXPECT_GEN_EQUAL to avoid deleting records modified since the scan. Durable deletes are enabled (durableDelete = true).
The race condition occurs in two scenarios:
Scenario A — Concurrent delete:
T1: Writer reads record (generation=N)
T2: Cleanup process deletes the record (generation check passes, record removed)
T3: Writer writes with generation=N → KEY_NOT_FOUND
Scenario B — TTL expiration:
T1: Writer reads record (generation=N, TTL is near expiry)
T2: Record TTL expires, Aerospike removes the record
T3: Writer writes with generation=N → KEY_NOT_FOUND
Relevant Write Policy Configuration (Writer Process)
WritePolicy writePolicy = new WritePolicy();
writePolicy.recordExistsAction = RecordExistsAction.UPDATE; writePolicy.generationPolicy = GenerationPolicy.EXPECT_GEN_EQUAL;
writePolicy.generation = <generation from prior read, or 0 if record was not found>;
writePolicy.socketTimeout = 250; // milliseconds
writePolicy.totalTimeout = 300; // milliseconds
writePolicy.maxRetries = 2;
writePolicy.sleepBetweenRetries = 0;
writePolicy.commitLevel = CommitLevel.COMMIT_ALL;
writePolicy.durableDelete = false;
writePolicy.sendKey = true;
We can provide additional configuration details (read policy, cleanup process policy, batch policy, etc.) if needed.
Application-Level Retry Logic
When the first write fails (any failure, not just KEY_NOT_FOUND), the application retries once:
1. Re-reads the record from Aerospike
2. Reapplies the modifications to the fresh data
3. Writes again using UPDATE + EXPECT_GEN_EQUAL with the new generation
However, when the record no longer exists:
1. The re-read returns null (no record found)
2. The application constructs a new profile with version = null, which maps to generation = 0
3.The write is issued with RecordExistsAction.UPDATE + GenerationPolicy.EXPECT_GEN_EQUAL + generation = 0
-
This also fails with KEY_NOT_FOUND
-
Both attempts fail, and the incoming data is silently dropped.
Error Details
Exception from production logs:
com.aerospike.client.AerospikeException: Error 2,1,0,250,300,2,BB9<redacted> <node-ip>:<port>
Error field breakdown: Error <resultCode>,<iteration>,<inDoubt>,<socketTimeout>,<totalTimeout>,<maxRetries>,<key-digest> <node> <port>
- resultCode = 2 (KEY_NOT_FOUND)
- inDoubt = 0 (server definitively responded --- not a timeout ambiguity)
**Observed error rate: The errors have been happening for a while now with peaks reaching ~930 errors/sec and recurring bursts in the 100-600 errors/sec range. The pattern aligns with the cleanup process scan cycles.
Questions
1. When RecordExistsAction.UPDATE is used with GenerationPolicy.EXPECT_GEN_EQUAL and the target record does not exist (either deleted or TTL-expired), is KEY_NOT_FOUND the expected server response? The documentation describes UPDATE as “Create or update record” — we want to confirm whether the “create” semantic is suppressed when a generation policy is set.
2. Specifically for generation = 0 on a non-existent key: is there any combination of generation value and policy that would allow UPDATE to create the record, or does EXPECT_GEN_EQUAL unconditionally require an existing record to compare against?
3. Given that durable deletes are enabled on the cleanup process, does the resulting tombstone interact with the generation check in any way? (i.e., does a tombstone retain a generation that could be matched, or is it transparent to subsequent writes?)
4. What is the recommended approach for maintaining optimistic concurrency control (generation checks) on writes while gracefully handling the case where the record was legitimately removed between read and write?
Environment
-
Storage: data on disk, indexes in memory
-
Replication factor: 2
-
Client: com.aerospike:aerospike-client-jdk8:10.1.0
Any guidance from the Aerospike team or community would be greatly appreciated.
Thank you.