"EOF" error and server performance degradation

quexer · September 17, 2015, 8:10am

hi,

We found AS server 3.5.9 write performance is not stable. From time to time, writes_master performance appears degradation (reads performance keeps normal). And Go client will show many EOF errors during degradation.

Here’s the command " asloglatency -h writes_master -n 8" output:

slice-to (sec)      1      8     64    512   4096  32768  ops/sec
-------------- ------ ------ ------ ------ ------ ------ --------
.......
06:31:03    10   3.02   0.01   0.00   0.00   0.00   0.00    688.6
06:31:13    10   3.41   0.00   0.00   0.00   0.00   0.00    682.4
06:31:23    10   9.93   6.93   6.93   6.93   0.00   0.00    453.0
06:31:33    10   4.91   1.19   1.14   0.70   0.00   0.00    657.6
06:31:43    10   3.27   0.00   0.00   0.00   0.00   0.00    645.3
06:31:53    10   2.67   0.00   0.00   0.00   0.00   0.00    660.2
06:32:03    10   3.49   0.00   0.00   0.00   0.00   0.00    647.8
.......
07:00:14    10   5.31   0.05   0.00   0.00   0.00   0.00   1015.4
07:00:24    10   4.16   0.01   0.00   0.00   0.00   0.00    911.9
07:00:34    10   3.31   0.00   0.00   0.00   0.00   0.00    771.4
07:00:44    10   7.01   3.60   3.46   2.88   0.00   0.00    710.9
07:00:54    10   7.12   3.31   3.11   1.92   0.00   0.00    676.0
07:01:04    10   3.50   0.09   0.00   0.00   0.00   0.00    659.6
07:01:14    10   2.66   0.00   0.00   0.00   0.00   0.00    661.7

Is there anything wrong and how can I fix this issue ?

Thanks.

meher · September 17, 2015, 7:10pm

Thanks for reaching out on our forum. It is definitely not possible to provide much hint based on this input.

My first suggestion would be to check if there is any pattern to those degradations. Do they come at regular interval? Then, try to correlate with other similar patterns, for example storage performance (if this is on a namespace that has disk storage) or something else (looking at the aerospike logs) that may be happening with similar interval.

quexer · September 18, 2015, 2:46am

hi meher,

It’s not at regular interval. The interval could be 3mins, 1hour, 20hours …

The cluster is composed by 5 nodes. serve only 1 namespace.

The namespace if memory-only with HDD storage. here’s the config:

namespace foobar {
	replication-factor 2
	memory-size 32G
	storage-engine memory
        ldt-enabled   true

	storage-engine device {
		file /data/push.dat
		filesize 128G
		data-in-memory true
	}
}

The memory usage is about 40%.

Thanks

meher · September 18, 2015, 5:40am

I notice that LDT are enabled on this namespace. Is there LDT traffic going on at the same time as the regular write traffic?

Could you record iostat and capture how it evolves during those latency spikes? (to check if it could be caused by the underlying device).

You could also turn on microbenchmarks and analyze as described in this post about write performance analysis, especially to see if those latency spikes are caused by the network.

Topic		Replies	Views
Read/write performance spikes	1	3401	December 23, 2015
Aerospike slow performance write/batch-read	3	3577	October 16, 2017
What does client_write_timeout stands for? How Aerospike Works	10	2042	November 23, 2019
Large read latency during a heavy write load Tuning	21	4272	April 5, 2017
Some questions about aerospike performance test Aerospike Server Benchmarks	7	1546	April 7, 2021

"EOF" error and server performance degradation

Related topics