How to identify a bad DC that cause XDR throttling

Aerospike_Knowledge · February 13, 2018, 12:43am

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

How to identify a bad DC causing XDR throttling

Detail

As explained in the following two articles DC latency and XDR throttling and XDR maitenance, all remote DCs will be affected by the throttling as the same read-thread would sleep by the same amount. This article focuses on identifying a bad DC which is still in the CLUSTER_UP state. A cluster in the CLUSTER_DOWN state is covered in the following article.

Method

Run the command “show statistics dc” in asadm in order to analyze the relevant metrics.

In particular, an increasing moving average is an indication of a bad DC:

dc_ship_latency_avg

Also, due to XDR throttling, the following metrics will also be higher than other DCs:

Refer to the KB article about XDR monitoring.

Looking the log file (by default under /var/log/aerospike/aerospike.log), we can also identify the bad DC by examining the following log line:

[DC_NAME]: dc-state CLUSTER_UP timelag-sec 2 lst 1468006386894 mlst 1468006389647 (2016-07-08 19:33:09.647 GMT) fnlst 0  wslst 0  shlat-ms 0 rsas-ms 0.004 rsas pct 0.0 con 384 errcl 0 errsrv 0 sz 6

The slower DC will have higher rsas-pct , rsas-ms and shlat-ms.
They correspond to the metrics dc_ship_idle_avg_pct, dc_ship_idle_avg and xdr_ship_latency_avg respectively.

Refer to the KB article about XDR log messages.

Examples

Consider a 3 node source cluster (DC1) shipping to a 2 node destination cluster (DC2) and another 2 node destination cluster (DC3). We are going to write to DC1 on namespace test which will ship to both DC2 and DC3. We will also write to namespace nsSSD which will ship to DC3 only.

./run_benchmarks -h 10.0.100.254 -p 3000 -n test -k 60000 -S 1 -o S:1000 -w I -g 100
./run_benchmarks -h 10.0.100.254 -p 3000 -n nsSSD -k 60000 -S 1 -o S:1000 -w I -g 100

Initially, we have the following statistics:

Admin> show stat dc
.
.
.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC2 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 128                 
dc_ship_attempt          :   12299               22875               23958               
dc_ship_bytes            :   1450645             13043870            13749120            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   0                   
dc_ship_idle_avg         :   0.030               0.017               0.012               
dc_ship_idle_avg_pct     :   0.2                 0.1                 0.2                 
dc_ship_inflight_objects :   0                   0                   16                  
dc_ship_latency_avg      :   3                   4                   4                   
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   1337                12022               12656               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   1                   0                   0                   

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC3 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 128                 
dc_ship_attempt          :   1337                12022               12672               
dc_ship_bytes            :   13355377            24830228            26005716            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   0                   
dc_ship_idle_avg         :   0.683               0.632               0.622               
dc_ship_idle_avg_pct     :   4.1                 4.7                 4.4                 
dc_ship_inflight_objects :   0                   0                   35                  
dc_ship_latency_avg      :   8                   12                  13                  
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   12299               22875               23923               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   1                   0                   0

To simulate network issues, we use iptables to drop the packets from node 10.0.100.254 (in DC1) to one of the remote nodes on DC2 for 30 seconds, by running the following command on that one node in DC2:

iptables -I INPUT -p tcp --dport 3000 -s 10.0.100.254/32 -j DROP

IPtables allows you to specify a number of ways to simulate network issues. DROP will silently drop packets, simulating 100% packet loss.

Alternatively, using REJECT, will simulate no one listening or a blocked firewall connection, as by default REJECT responds with icmp-port-unreachable.

If you wish to simulate a host down, you could instead perform ‘-j REJECT --reject-with icmp-host-unreachable’

The statistics is like this after 30 seconds:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC2 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 67                  
dc_ship_attempt          :   14645               25247               24898               
dc_ship_bytes            :   2646315             14381675            14310065            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   269                 
dc_ship_idle_avg         :   0.018               0.010               174.406             
dc_ship_idle_avg_pct     :   0.1                 0.1                 11.4                
dc_ship_inflight_objects :   0                   0                   3                   
dc_ship_latency_avg      :   4                   6                   1054                
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   2439                13255               12917               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   0                   0                   18                  

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC3 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 128                 
dc_ship_attempt          :   2439                13255               13187               
dc_ship_bytes            :   15902031            27404987            27026041            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   0                   
dc_ship_idle_avg         :   0.405               0.375               0.369               
dc_ship_idle_avg_pct     :   2.4                 2.8                 2.6                 
dc_ship_inflight_objects :   0                   0                   0                   
dc_ship_latency_avg      :   8                   11                  9                   
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   14645               25247               24898               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   0                   0                   18

The dc_ship_idle_avg, dc_ship_idle_avg_pct and dc_ship_latency_avg havei ncreased for the last node while the dc_open_conn has decreased.

The log lines are like these on the node in DC1 from which packets are not allowed to get to one of the nodes in DC2:

Feb 02 2018 23:08:08 GMT: WARNING (xdr): (xdr_ship.c:844) Marking cluster DC2 as suspicious because of a timeout.
Feb 02 2018 23:08:08 GMT: INFO (xdr): (as_cluster.c:536) Node BB9FB13F0005452 refresh failed: AEROSPIKE_ERR_TIMEOUT 
Feb 02 2018 23:08:17 GMT: INFO (xdr): (xdr_dlog.c:95) dlog: free-pct 100 reclaimed 20300 glst 1517612888706 (2018-02-02 23:08:08 GMT)
Feb 02 2018 23:08:17 GMT: INFO (xdr): (xdr.c:607) [DC2]: dc-state CLUSTER_UP timelag-sec 8 lst 1517612888706 mlst 1517612888706 (2018-02-02 23:08:08 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 308 rsas-ms 117.880 rsas-pct 4.9 con 79 errcl 0 errsrv 250 sz 2
Feb 02 2018 23:08:17 GMT: INFO (xdr): (xdr.c:607) [DC3]: dc-state CLUSTER_UP timelag-sec 8 lst 1517612888706 mlst 1517612888706 (2018-02-02 23:08:08 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 11 rsas-ms 0.444 rsas-pct 3.1 con 128 errcl 0 errsrv 0 sz 2
.
.
.
Feb 02 2018 23:08:19 GMT: INFO (xdr): (as_cluster.c:536) Node BB9FB13F0005452 refresh failed: AEROSPIKE_ERR_TIMEOUT 
Feb 02 2018 23:08:27 GMT: INFO (xdr): (xdr.c:628) summary: throughput 3 inflight 6 dlog-outstanding 57 dlog-delta-per-sec 0.3
Feb 02 2018 23:08:27 GMT: INFO (xdr): (xdr.c:638) detail: sh 38079 ul 64 lg 31208 rlg 235 rlgi 0 rlgo 265 lproc 31151 rproc 9346 lkdproc 0 errcl 0 errsrv 265 hkskip 1 hkf 1 flat 0
.
.
.
Feb 02 2018 23:08:39 GMT: INFO (xdr): (xdr_ship.c:1593) Grace period for suspicious cluster DC2 expired. Checking its health.
Feb 02 2018 23:08:47 GMT: INFO (xdr): (xdr.c:628) summary: throughput 489 inflight 0 dlog-outstanding 53 dlog-delta-per-sec 0.4
Feb 02 2018 23:08:47 GMT: INFO (xdr): (xdr.c:638) detail: sh 43068 ul 0 lg 34444 rlg 240 rlgi 0 rlgo 271 lproc 34391 rproc 9346 lkdproc 0 errcl 0 errsrv 271 hkskip 1 hkf 1 flat 0

As soon as we restore the traffic, the dc_open_conn returns back to 128 while other metrics are slowly getting back to normal:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC2 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 128                 
dc_ship_attempt          :   17106               27563               29173               
dc_ship_bytes            :   3984120             15635935            16773015            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   271                 
dc_ship_idle_avg         :   0.010               0.006               334.640             
dc_ship_idle_avg_pct     :   0.0                 0.0                 16.9                
dc_ship_inflight_objects :   0                   0                   0                   
dc_ship_latency_avg      :   5                   6                   828                 
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   3672                14411               15188               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   0                   0                   0                   

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~DC3 DC Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                     :   10.0.100.137:3000   10.0.100.173:3000   10.0.100.254:3000   
dc_open_conn             :   128                 128                 128                 
dc_ship_attempt          :   3672                14411               15459               
dc_ship_bytes            :   18573444            29919007            31666419            
dc_ship_delete_success   :   0                   0                   0                   
dc_ship_destination_error:   0                   0                   0                   
dc_ship_idle_avg         :   0.240               0.223               0.219               
dc_ship_idle_avg_pct     :   1.4                 1.6                 1.5                 
dc_ship_inflight_objects :   0                   0                   0                   
dc_ship_latency_avg      :   8                   9                   7                   
dc_ship_source_error     :   0                   0                   0                   
dc_ship_success          :   17106               27563               29173               
dc_size                  :   2                   2                   2                   
dc_state                 :   CLUSTER_UP          CLUSTER_UP          CLUSTER_UP          
dc_timelag               :   0                   0                   0

Let’s then run the following commands to reject packets for 30 seconds again on the same remote node of DC2:

iptables -I INPUT -p tcp --dport 3000 -s 10.0.100.254/32 -j REJECT
sleep 30
iptables -F

Then logs will then show a different error:

Feb 02 2018 23:09:17 GMT: INFO (xdr): (xdr.c:638) detail: sh 45896 ul 0 lg 39488 rlg 443 rlgi 0 rlgo 474 lproc 39441 rproc 9346 lkdproc 0 errcl 1 errsrv 473 hkskip 1 hkf 1 flat 0

.
.
.
Feb 02 2018 23:09:24 GMT: INFO (xdr): (as_cluster.c:536) Node BB9FB13F0005452 refresh failed: AEROSPIKE_ERR_CONNECTION Socket write error: 111, 10.0.100.235:3000, 52476
Feb 02 2018 23:09:25 GMT: INFO (xdr): (as_cluster.c:536) Node BB9FB13F0005452 refresh failed: AEROSPIKE_ERR_CONNECTION Socket write error: 111, 10.0.100.235:3000, 52478
Feb 02 2018 23:09:26 GMT: INFO (xdr): (as_cluster.c:536) Node BB9FB13F0005452 refresh failed: AEROSPIKE_ERR_CONNECTION Socket write error: 111, 10.0.100.235:3000, 52480
.
.
.
Feb 02 2018 23:09:27 GMT: INFO (xdr): (xdr.c:638) detail: sh 45956 ul 35 lg 41079 rlg 452 rlgi 0 rlgo 483 lproc 41079 rproc 9346 lkdproc 0 errcl 1 errsrv 482 hkskip 1 hkf 1 flat 0
.
.
.
Feb 02 2018 23:10:17 GMT: INFO (xdr): (xdr.c:607) [DC2]: dc-state CLUSTER_UP timelag-sec 0 lst 1517613016883 mlst 1517613016883 (2018-02-02 23:10:16 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 962 rsas-ms 306.784 rsas-pct 17.9 con 128 errcl 1 errsrv 491 sz 2
Feb 02 2018 23:10:17 GMT: INFO (xdr): (xdr.c:607) [DC3]: dc-state CLUSTER_UP timelag-sec 0 lst 1517613016883 mlst 1517613016883 (2018-02-02 23:10:16 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 5 rsas-ms 0.059 rsas-pct 0.4 con 128 errcl 0 errsrv 0 sz 2

This will have a similar impact and can also be observed under the errsrv statistic along with shat-ms, rsas-ms and rsas-pct. Refer to the knowledge-base article on XDR throttling for all the causes of throttling. It is for example possible to have throttling without any errors (for example, latencies of 25% of xdr_timeout).

Applies To

Server prior to v. 5.0

Keywords

XDR SHIP LATENCY AVG OPEN CONNECTION THROTTLE CLUSTER_DOWN CLUSTER_UP

Timestamp

02/02/18

Topic		Replies	Views
Question about max-throughput and lag XDR (Cross Data Center Replication)	4	825	August 10, 2022
Replacement for xdr_ship_bytes metric post Aerospike 5.0? Monitoring	3	528	July 8, 2022
Is there any way for Aerospike to max out the MSS? XDR (Cross Data Center Replication)	2	1602	July 30, 2015
Network throughput issues with asd running ec2 , az , amazon	19	5954	January 25, 2016
Question regarding 'asadm show latency' missing some stats from a few nodes asadmin	9	92	March 14, 2024