We use aerospike-server-community 3.6.0-1 on Ubuntu Server 12.04 (RAM: 48G, CPU: 2 x Intel Xeon X5650) and we got some read/write performance spikes as you can see at screenshot Screenshot - cc96722695ea64e0bbfac41c45c5591b - Gyazo
We investigated the issue with asloglatency
asloglatency -h reads
10:45:47 10 2.46 1.18 0.00 1023.9
10:45:57 10 4.93 2.34 0.00 1810.4
10:46:07 10 6.33 2.86 0.00 3013.8
10:46:17 10 7.88 3.84 0.04 4711.5
10:46:27 10 8.25 3.64 0.00 5583.8
10:46:37 10 47.37 43.80 39.64 4163.2
10:46:47 10 7.75 3.53 0.00 5008.5
10:46:57 10 6.13 2.82 0.00 2882.8
10:47:07 10 1.99 0.94 0.00 1185.6
10:47:17 10 6.88 3.52 0.08 3738.9
10:47:27 10 7.24 3.38 0.00 4865.8
10:47:37 10 7.95 3.43 0.00 5490.7
10:47:47 10 39.72 35.68 31.48 4427.7
10:47:57 10 7.38 3.31 0.00 5087.1
10:48:07 10 6.17 2.62 0.00 3650.3
10:48:17 10 2.76 1.24 0.00 1358.6
10:48:27 10 7.89 3.67 0.00 5340.3
10:48:37 10 7.21 3.30 0.00 5508.1
asloglatency -h reads_storage_read
10:45:47 10 2.43 1.22 0.00 968.1
10:45:57 10 5.07 2.42 0.00 1709.3
10:46:07 10 6.52 3.00 0.00 2820.2
10:46:17 10 8.06 3.88 0.04 4399.7
10:46:27 10 8.52 3.80 0.00 5238.0
10:46:37 10 43.97 4.67 1.06 3999.9
10:46:47 10 7.86 3.63 0.00 4806.4
10:46:57 10 6.27 2.93 0.00 2743.5
10:47:07 10 2.04 0.98 0.00 1117.7
10:47:17 10 6.85 3.52 0.08 3523.2
10:47:27 10 7.51 3.54 0.00 4549.7
10:47:37 10 8.31 3.62 0.00 5123.6
10:47:47 10 36.98 5.16 0.94 4209.1
10:47:57 10 7.52 3.41 0.00 4878.0
10:48:07 10 6.26 2.69 0.00 3478.1
10:48:17 10 2.89 1.33 0.00 1271.9
10:48:27 10 8.17 3.84 0.00 5019.4
10:48:37 10 7.49 3.46 0.00 5150.5
it looks like the problem related to device (ssd) performance.
As you can see “reads_storage_read” latency slowdown mainly over 1 ms, but “read” latency slowdown over 64ms too. What can be a reason?
Bellow I provided configuration and top output. Should I provide more details?
Our server confguration:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 24
transaction-queues 24
transaction-threads-per-queue 4
proto-fd-max 15000
}
namespace ssd {
replication-factor 1
memory-size 30G
default-ttl 30d # 30 days, use 0 to never expire/evict.
high-water-memory-pct 90
high-water-disk-pct 90
storage-engine device {
device /dev/sdb
device /dev/sdc
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K # adjust block size to make it efficient for SSDs
defrag-lwm-pct 54
defrag-startup-minimum 5
#data-in-memory true # Store data in memory in addition to file.
}
}
namespace devices {
replication-factor 1
memory-size 2G
default-ttl 0 # 30 days, use 0 to never expire/evict.
high-water-memory-pct 99
high-water-disk-pct 99
storage-engine device {
file /opt/aerospike/data/devices.dat
data-in-memory true # Store data in memory in addition to file.
}
}
Server top output:
top - 14:08:35 up 91 days, 23:16, 4 users, load average: 7.92, 10.27, 11.64
Tasks: 305 total, 2 running, 303 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.9%us, 3.8%sy, 0.0%ni, 73.2%id, 14.2%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 49451000k total, 45280444k used, 4170556k free, 173308k buffers
Swap: 33542140k total, 18520k used, 33523620k free, 15225032k cached