Handling timeout in case of counter bin

To rephrase the question, “is there a way to know if a write definitely applied or definitely didn’t apply?”

First, timeouts are not the only error you should be concerned with. Newer clients have an ‘inDoubt’ flag associated with errors that will indicate that the write may or may not have applied.

There isn’t a built-in way of resolving an in-doubt transaction to a definitive answer and if the network is partitioned, there isn’t a way in AP to rigorously resolve in-doubt transactions. Rigorous methods do exist for ‘Strong Consistency’ mode, the same methods can be used to handle common AP scenarios but they will fail under partition.

The method I have used is as follows:

  1. Each record will need a list bin, the list bin will contain the last N transaction ids.
    • For my use case, I gave each client an unique 2 byte identifier - each client thread has an unique 2 byte identifier - and each client thread had a 4 byte counter. So a particular transaction-id would look like would mask an 8 byte identifier from the 2 ids and counter.
  2. * Read the records metadata with the getHeader api - this avoids reading the records bins from storage.
    • Note - my use case wasn’t an increment so I actually had to read the record and write with a generation check. This pattern should be more efficient for a counter use case.
  3. Write the record using operate and gen-equal to the read generation with the these operations: increment the integer bin, prepend to the list of txns, and trim the list of txns. You will prepend you transaction-id to your txns list and then trim the list to the max size of the list you selected.
    • N needs to be large enough such that a record can be sure to have enough time to verify its transaction given the contention on the key. N will affect the stored size of the record so choosing too big will cost disk resource and choosing too small will render the algorithm ineffective.
  4. If the transaction is successful then you are done.
  5. If the transaction is ‘inDoubt’ then read the key and check the txns list for your transaction-id. If present then your transaction ‘definitely succeeded’.
  6. If your transaction-id isn’t in txns, repeat step 3 with the generation returned from the read in step 5 with the exception a ‘generation error’ on step 5 would also need to be considered ‘in-doubt’ since it may have been the previous attempt that finally applied.

Also consider that reading the record in step 5 and not finding the transaction-id in txns does not ensure that the transaction ‘definitely failed’. If you wanted to leave the record unchanged but have a ‘definitely failed’ semantic you would need to have observed the generation move past the previous write’s gen-check policy. If it hasn’t you could replace the operation in step 6 with a touch - if it succeeds then the initial write ‘definitely failed’ and if you get a generation-error you will need to check if you raced the application of the initial transaction initial write may now have ‘definitely succeeded’.

Again, with ‘Strong Consistency’ the mentions of ‘definitely succeeded’ and ‘definitely failed’ are accurate statements, but in AP these statements have failure modes (especially around network partitions).

1 Like