FAQ - Source generated errors during XDR processing
A record progresses through a lifecycle of various states while being processed by XDR (Cross-Datacenter Replication). A record’s states in its lifecycle are logged with various XDR metrics that indicate progress. Once the service thread reads the record locally and ships it, the remote destination attempts to write the record and can return the completion state of the transaction to the source datacenter with the response code which can be postive (+ve) or negative (-ve). The completion state can be success, temporary failure (like key busy/device overload), or permanent error (like record too big).
In general, negative error codes are local errors generated (at the source) and positive error codes are generated by the destination response. If the detailed logging is enabled, you can see the response codes generated in the logs.
All these responses can be put into 3 different buckets as shown in the table below:
|Completion State||Error Code|
|Complete||-2, -3, 0, 2|
|Retry||-5, -8, -7, -9, 8, 9, 11, 14, 18, 80|
|Abandoned||-1, -6, -4 and all other +ve numbers|
Local Error Codes in new XDR:
1. LOCAL_ERR_REC_READ -1 : This error is very rare and it happens when the service thread failed to read the disk locally. Could be caused by a storage device failure. Such errors would cause the record to be abandoned from shipping.
2. LOCAL_ERR_REC_NOT_FOUND -2 : When there is a delete followed by an insert (XDR shipper will not find the record), this not found error is returned. Shipping for this is not needed as it is already deleted. So it’s considered as complete transaction (the
not_found stat will increase).
3. LOCAL_ERR_REC_FILTERED_OUT -3 : When the XDR is not able to ship the record because of the configured filter. Currently only bin filters contribute to this.(set filters don’t contribute to this because they get applied very early before even submitting the transaction to the queue). It’s considered as complete transaction (the
filtered_out stat will increase).
4. LOCAL_ERR_REC_UNREPLICATED -4 : In SC configured namespaces, if XDR encounters an unreplicated record, it will abandon that attempt and trigger re-replication. Re-replication will be treated like a fresh write. So this is considered as abandoned. For further details, refer to the XDR Delays article.
5. LOCAL_ERR_REC_REPLICATING -5 : In SC configured namespaces, if XDR encounters a record which is in the process of replication, we will retry this internally (in this case, the
retry_dest statistic will not increase).
6. LOCAL_ERR_REC_ABANDONED -6 : When a namespace is disabled, the pending operations will be abandoned (in this case, the
abandoned stat will not increase).
7. LOCAL_ERR_NO_NODE -7 : This error appears when XDR doesn’t know the master node for the record (the
retry_no_node stat will increase – version 5.1 and up) and this record will be retried. This typically happens when XDR does not discover the full destination cluster (could be caused by firewall settings, in which case, all records will fail, the other usual cause could be that the namespace is not defined on the destination cluster).
8. LOCAL_ERR_CONN_BUSY -8 : When the connection to the destination node is busy (previous transfer still pending), the record will go into retry queue (the
retry_dest statistic will not increase).
9. LOCAL_ERR_CONN_RESET -9 : When connection gets closed due to some error. Because of pipelining, there can be many records that will be retried when a connection gets reset. A connection can get reset due to timeouts, network issues, or destination node restarts (the
retry_conn_reset statistic will increase).
As mentioned in the above table, destination returned error codes 8, 9, 11, 14, 18 and 80 will be retried and the
retry_dest statistic will increase for those errors. Other positive errors (returned by destination) will abandon processing and increase the
abandoned statistic. Destination returned error codes are described in the client error code documentation.
XDR 5.0 ERROR-CODES ABANDONED