I have two sites with a cluster of two nodes each of the sites, with replication factor 2.
The goal is to monitor the XDR shipping performance and trying to compare the available metrics: xdr_ship_success, client_write_sucess, dlog_logged, xdr_throughput, etc.
If possible, I’d like to see the performance at the level of node first, and then of cluster and of global.
I found that:
client_write_success includes all records from its local (site 1) and remote (from site 2): is there any way to know how many of them are from local or remote?
dlog_logged looks like summing up all the records to ship from the two nodes of a cluster in site 1: is there any way to know how many of them is from each node?
Especially, I’m focusing on the node-level xdr performance. Is it doable based on these metrics, or can anyone suggest any other way?
This is the stat for the writes originating from an XDR client:
dlog_logged shows all the digest log entries on a single node… it will have both master and prole even though, it would only process and ship master records, unless a node goes down, in which case other nodes start shipping the prole records matching the master records that were owned by the node that went down. On a 2 node cluster, both nodes would log everything, but that’s a special case (assuming replication factor 2).
Since you seem to have an Enterprise build (XDR) I would expect you to be in touch with someone at Aerospike. There are quite a few metrics and extra debug / tracing functionality that can be turned on for detail performance analysis.