The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
Details
XDR (Cross-Datacenter Replication) is one of Aerospike Enterprise features and is designed to synchronize clusters over higher-latency links asynchronously. More details on XDR - XDR Architecture
This knowledge base covers the key metrics to be monitored for XDR performance on a source cluster.
For server version 5.x and above
General Performance:
-
Average latency to ship a record to remote Aerospike cluster: latency_ms
-
Time lag across for a given data center: lag
-
Number of records pending completion: in_progress
-
Current throughput: throughput
-
Time taken to process records across partitions in one lap: lap_us
-
Number of write requests in the XDR in-memory queue: in_queue
-
Number of sucessful record shipped: success
-
Number of partitions that are recovered by reducing the primary index of that partition: recoveries
Errors:
-
Number of records being retried at the source due to connection reset: retry_conn_reset
-
Number of records being retried at the source due to a temporary error returned by destination node: retry_dest
-
Number of records abandoned due to permanent errors returned by destination node: abandoned
Monitoring info command:
XDR 5 introduces the new get-stats
info commands:
Allowing the monitoring of DC level stats:
asinfo -v 'get-stats:context=xdr;dc=<DCNAME>' -l
and namespace level stats for XDR:
asinfo -v 'get-stats:context=xdr;dc=<DCNAME>;namespace=<NAMESPACE>' -l
Examples:
Admin> asinfo -v 'get-stats:context=xdr;dc=REMOTE_DC_1' -l
ubuntu-bionic:3000 (1.1.1.201) returned:
lag=0
in_queue=0
in_progress=0
success=0
abandoned=0
not_found=0
filtered_out=0
retry_conn_reset=0
retry_dest=0
recoveries=4096
recoveries_pending=0
hot_keys=0
uncompressed_pct=0.000
compression_ratio=1.000
throughput=0
latency_ms=0
lap_us=1629
Admin> asinfo -v 'get-stats:context=xdr;dc=REMOTE_DC_1;namespace=test' -l
ubuntu-bionic:3000 (1.1.1.201) returned:
lag=0
in_queue=0
in_progress=0
success=0
abandoned=0
not_found=0
filtered_out=0
retry_conn_reset=0
retry_dest=0
recoveries=4096
recoveries_pending=0
hot_keys=0
uncompressed_pct=0.000
compression_ratio=1.000
throughput=0
For server versions between 3.9.0 and 5.x
General Performance:
-
Average latency to ship a record to remote Aerospike cluster: xdr_ship_latency_avg
-
Maximum of time lag across all remote data centers: xdr_timelag
-
Number of outstanding records: xdr_ship_outstanding_objects
-
Current throughput: xdr_throughput
-
Free digest-log percentage: dlog_free_pct
-
Number of write requests initiated by XDR that succeeded on the namespace on this node: xdr_write_success
Errors:
-
Number of records being relogged at the source: dlog_relogged
-
Number of errors while shipping records: xdr_ship_source_error
-
Number of errors from the remote while shipping: xdr_ship_destination_error
-
Number of local read errors: xdr_read_error
At the per-DC level:
-
Moving average of shipping latency for the specific datacenter: dc_ship_latency_avg
-
Time lag for this specific datacenter: dc_timelag
Digest-log latency for reads and writes: Note that these no longer exist in version 3.9.
Latency to read the records from the local Aerospike server:
- Moving average latency to read a record/batch of records from local Aerospike server: xdr_read_latency_avg
For server versions between 3.8.1 and 3.9.0
General Performance:
-
Average latency to ship a record to remote Aerospike cluster: latency_avg_ship
-
Maximum of time lag across all remote data centers: xdr_timelag
-
Number of outstanding records: stat_recs_outstanding
-
Current throughput: cur_throughput
-
Free digestlog percentage: free-dlog-pct
Errors:
-
Number of records being relogged at the source: stat_recs_relogged
-
Number of errors while shipping records: err_ship_client
-
Number of errors from the remote while shipping: err_ship_server
-
Number of local read errors: local_recs_error
At the per-DC level:
-
Moving average of shipping latency for the specific datacenter: dc_latency_avg_ship
-
Time lag for this specific datacenter: dc_timelag
Digest-log latency for reads and writes: (note that these no longer exist in version 3.9 as they were potentially not critical to monitor):
-
Moving average latency to read from the digest log: latency_avg_dlogread
-
Moving average latency to write from the digest log: latency_avg_dlogwrite
Latency to read the records from the local Aerospike server:
- Moving average latency to read a record/batch of records from local Aerospike server: local_recs_fetch_avg_latency
For server version 3.8.0 and earlier
General Performance:
-
Average latency to ship a record to remote Aerospike cluster: latency_avg_ship
-
Maximum of time lag across all remote data centers: timediff_lastship_cur_secs
-
Number of outstanding records: stat_recs_outstanding
-
Current throughput: cur_throughput
-
Free digestlog percentage: free-dlog-pct
Errors:
-
Number of records being relogged at the source: stat_recs_relogged
-
Number of errors while shipping records: err_ship_client
-
Number of errors from the remote while shipping: err_ship_server
-
Number of local read errors: local_recs_error
At the per-DC level:
- DC-level statistics were introduced in server version 3.8.1 and above.
Digest-log latency for reads and writes: (note that these no longer exist in version 3.9 as they were potentially not critical to monitor):
-
Moving average latency to read from the digest log: latency_avg_dlogread
-
Moving average latency to write from the digest log: latency_avg_dlogwrite
Latency to read the records from the local Aerospike server:
- Moving average latency to read a record/batch of records from local Aerospike server: local_recs_fetch_avg_latency
Reference
See our metric reference to see details about the above statistics and to know more about other statistics:
Keywords
XDR MONITOR STATISTICS
Timestamp
June 2020