Monitoring XDR performance


#1

I have two sites with a cluster of two nodes each of the sites, with replication factor 2.

The goal is to monitor the XDR shipping performance and trying to compare the available metrics: xdr_ship_success, client_write_sucess, dlog_logged, xdr_throughput, etc.

If possible, I’d like to see the performance at the level of node first, and then of cluster and of global.

I found that:

  • client_write_success includes all records from its local (site 1) and remote (from site 2): is there any way to know how many of them are from local or remote?
  • dlog_logged looks like summing up all the records to ship from the two nodes of a cluster in site 1: is there any way to know how many of them is from each node?

Especially, I’m focusing on the node-level xdr performance. Is it doable based on these metrics, or can anyone suggest any other way?

Any good articles on this issue?


#2

This is the stat for the writes originating from an XDR client:

http://www.aerospike.com/docs/reference/metrics#xdr_write_success

dlog_logged shows all the digest log entries on a single node… it will have both master and prole even though, it would only process and ship master records, unless a node goes down, in which case other nodes start shipping the prole records matching the master records that were owned by the node that went down. On a 2 node cluster, both nodes would log everything, but that’s a special case (assuming replication factor 2).

Since you seem to have an Enterprise build (XDR) I would expect you to be in touch with someone at Aerospike. There are quite a few metrics and extra debug / tracing functionality that can be turned on for detail performance analysis.

Thanks.


#3

Thanks! I missed noticing the metric. It helps with my use case.