Monitoring XDR performance


I have two sites with a cluster of two nodes each of the sites, with replication factor 2.

The goal is to monitor the XDR shipping performance and trying to compare the available metrics: xdr_ship_success, client_write_sucess, dlog_logged, xdr_throughput, etc.

If possible, I’d like to see the performance at the level of node first, and then of cluster and of global.

I found that:

  • client_write_success includes all records from its local (site 1) and remote (from site 2): is there any way to know how many of them are from local or remote?
  • dlog_logged looks like summing up all the records to ship from the two nodes of a cluster in site 1: is there any way to know how many of them is from each node?

Especially, I’m focusing on the node-level xdr performance. Is it doable based on these metrics, or can anyone suggest any other way?

Any good articles on this issue?


This is the stat for the writes originating from an XDR client:

dlog_logged shows all the digest log entries on a single node… it will have both master and prole even though, it would only process and ship master records, unless a node goes down, in which case other nodes start shipping the prole records matching the master records that were owned by the node that went down. On a 2 node cluster, both nodes would log everything, but that’s a special case (assuming replication factor 2).

Since you seem to have an Enterprise build (XDR) I would expect you to be in touch with someone at Aerospike. There are quite a few metrics and extra debug / tracing functionality that can be turned on for detail performance analysis.



Thanks! I missed noticing the metric. It helps with my use case.