I use Aerospike 3.9.1 and a 6 nodes cluster. The write block size is 1MB, the disk is SSD. Each record is near 1 MB and it is a sorted Map. I found the TPS is total less than 200 TPS( 30-50 for each node) , is it a normal performance ? If so, how can I improve it? It seems too slow.
Please share your configuration and the make/model of your SSDs.
If a use more clients to write, it will throw following critical message.
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:164) NODE-ID bb916b0d4efd1a2 CLUSTER-SIZE 6
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:226) system-memory: free-kbytes 6291456 free-pct 100
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:240) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 2 proxy-hash 0 rec-refs 4396958
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:262) fds: proto (23,1084371,1084348) heartbeat (5,8,3) fabric (81,89,8)
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:271) heartbeat-received: self 0 foreign 10989587
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:295) early-fail: demarshal 0 tsvc-client 12 tsvc-batch-sub 0 tsvc-udf-sub 0
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:328) {production} objects: all 4396958 master 2242224 prole 2154734
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:366) {production} migrations: complete
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:394) {production} memory-usage: total-bytes 636298050 index-bytes 281405312 sindex-bytes 354892738 used-pct 11.85
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:433) {production} device-usage: used-bytes 553976241536 avail-pct 74 cache-read-pct 100.00
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:508) {production} client: tsvc (0,0) proxy (2,0,0) read (155,0,0,53) write (2505049,169497,0) delete (556,0,0,1) udf (0,0,0) lang (0,0,0,0)
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:564) {production} scan: basic (7,16,0) aggr (0,0,0) udf-bg (0,0,0)
Jun 05 2017 22:43:01 GMT: INFO (info): (ticker.c:589) {production} query: basic (365,37) aggr (0,0) udf-bg (0,0)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:137) histogram dump: {production}-read (208 total) msec
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (00: 0000000053) (09: 0000000004) (10: 0000000088) (11: 0000000059)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:163) (12: 0000000003) (13: 0000000001)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:137) histogram dump: {production}-write (2674546 total) msec
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (00: 0000874006) (01: 0000773347) (02: 0000451772) (03: 0000098166)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (04: 0000078446) (05: 0000095965) (06: 0000112145) (07: 0000111232)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (08: 0000041575) (09: 0000004748) (10: 0000006644) (11: 0000012674)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (12: 0000011261) (13: 0000002514) (14: 0000000050) (15: 0000000001)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:137) histogram dump: {production}-query (402 total) msec
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (00: 0000000226) (01: 0000000012) (02: 0000000005) (03: 0000000007)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (04: 0000000042) (05: 0000000027) (06: 0000000006) (07: 0000000001)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:154) (08: 0000000003) (09: 0000000054) (10: 0000000008) (11: 0000000006)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:163) (12: 0000000002) (13: 0000000001) (14: 0000000002)
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:137) histogram dump: {production}-query-rec-count (71 total) count
Jun 05 2017 22:43:01 GMT: INFO (info): (hist.c:163) (01: 0000000069) (07: 0000000002)
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:164) NODE-ID bb916b0d4efd1a2 CLUSTER-SIZE 6
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:226) system-memory: free-kbytes 6291456 free-pct 100
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:240) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 3 proxy-hash 0 rec-refs 4397013
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:262) fds: proto (23,1084377,1084354) heartbeat (5,8,3) fabric (81,89,8)
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:271) heartbeat-received: self 0 foreign 10989920
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:295) early-fail: demarshal 0 tsvc-client 12 tsvc-batch-sub 0 tsvc-udf-sub 0
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:328) {production} objects: all 4397008 master 2242248 prole 2154760
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:366) {production} migrations: complete
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:394) {production} memory-usage: total-bytes 636306726 index-bytes 281408512 sindex-bytes 354898214 used-pct 11.85
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:433) {production} device-usage: used-bytes 553981039488 avail-pct 74 cache-read-pct 0.00
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:508) {production} client: tsvc (0,0) proxy (2,0,0) read (155,0,0,53) write (2505072,169497,0) delete (556,0,0,1) udf (0,0,0) lang (0,0,0,0)
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:564) {production} scan: basic (7,16,0) aggr (0,0,0) udf-bg (0,0,0)
Jun 05 2017 22:43:11 GMT: INFO (info): (ticker.c:589) {production} query: basic (365,37) aggr (0,0) udf-bg (0,0)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:137) histogram dump: {production}-read (208 total) msec
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (00: 0000000053) (09: 0000000004) (10: 0000000088) (11: 0000000059)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:163) (12: 0000000003) (13: 0000000001)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:137) histogram dump: {production}-write (2674569 total) msec
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (00: 0000874009) (01: 0000773355) (02: 0000451775) (03: 0000098167)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (04: 0000078446) (05: 0000095965) (06: 0000112147) (07: 0000111233)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (08: 0000041575) (09: 0000004748) (10: 0000006644) (11: 0000012676)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (12: 0000011263) (13: 0000002515) (14: 0000000050) (15: 0000000001)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:137) histogram dump: {production}-query (402 total) msec
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (00: 0000000226) (01: 0000000012) (02: 0000000005) (03: 0000000007)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (04: 0000000042) (05: 0000000027) (06: 0000000006) (07: 0000000001)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:154) (08: 0000000003) (09: 0000000054) (10: 0000000008) (11: 0000000006)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:163) (12: 0000000002) (13: 0000000001) (14: 0000000002)
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:137) histogram dump: {production}-query-rec-count (71 total) count
Jun 05 2017 22:43:11 GMT: INFO (info): (hist.c:163) (01: 0000000069) (07: 0000000002)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production01.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production13.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production08.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production17.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production02.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production24.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production06.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production20.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: CRITICAL (drv_ssd): (drv_ssd.c:1316) /opt/aerospike/data/production18.dat: DEVICE FAILED write: errno 0 (Success)
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:96) SIGABRT received, aborting Aerospike Community Edition build 3.9.1 os el7
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: found 9 frames
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x35) [0x4a47a5]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 1: /lib64/libc.so.6(+0x35670) [0x7ffffd235670]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 2: /lib64/libc.so.6(gsignal+0x37) [0x7ffffd2355f7]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 3: /lib64/libc.so.6(abort+0x148) [0x7ffffd236ce8]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 4: /usr/bin/asd(cf_fault_sink_hold+0) [0x53a151]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 5: /usr/bin/asd(ssd_flush_swb+0x138) [0x515b44]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 6: /usr/bin/asd(run_ssd_maintenance+0x6b1) [0x5187a9]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 7: /lib64/libpthread.so.0(+0x7dc5) [0x7ffffee07dc5]
Jun 05 2017 22:43:18 GMT: WARNING (as): (signal.c:100) stacktrace: frame 8: /lib64/libc.so.6(clone+0x6d) [0x7ffffd2f6ced]
The config file:
# Aerospike database configuration file for use with systemd.
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
console {
context any info
}
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
#mode multicast
#address 127.0.0.1
#port 9918
mode mesh
address 10.16.100.2
port 3002
#mesh-seed-address-port 10.16.100.2 3002
mesh-seed-address-port 10.16.100.3 3002
mesh-seed-address-port 10.16.100.4 3002
mesh-seed-address-port 10.16.100.5 3002
mesh-seed-address-port 10.16.100.6 3002
mesh-seed-address-port 10.16.100.8 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace production {
replication-factor 2
memory-size 5G
default-ttl 0 # 30 days, use 0 to never expire/evict.
high-water-memory-pct 90
high-water-disk-pct 90
#storage-engine memory
storage-engine device {
file /opt/aerospike/data/production01.dat
file /opt/aerospike/data/production02.dat
file /opt/aerospike/data/production03.dat
file /opt/aerospike/data/production04.dat
file /opt/aerospike/data/production05.dat
file /opt/aerospike/data/production06.dat
file /opt/aerospike/data/production07.dat
file /opt/aerospike/data/production08.dat
file /opt/aerospike/data/production09.dat
file /opt/aerospike/data/production10.dat
file /opt/aerospike/data/production11.dat
file /opt/aerospike/data/production12.dat
file /opt/aerospike/data/production13.dat
file /opt/aerospike/data/production14.dat
file /opt/aerospike/data/production15.dat
file /opt/aerospike/data/production16.dat
file /opt/aerospike/data/production17.dat
file /opt/aerospike/data/production18.dat
file /opt/aerospike/data/production19.dat
file /opt/aerospike/data/production20.dat
file /opt/aerospike/data/production21.dat
file /opt/aerospike/data/production22.dat
file /opt/aerospike/data/production23.dat
file /opt/aerospike/data/production24.dat
file /opt/aerospike/data/production25.dat
filesize 100g
write-block-size 1M
data-in-memory false # Store data in memory in addition to file.
defrag-startup-minimum 10 # server needs at least 10%
}
}
Sorry, device overload is because of some node run out of disk.
Please run:
addr2line -fie /usr/bin/asd 0x515b44
addr2line -fie /usr/bin/asd 0x5187a9
thanks Kevin, sorry, this question is clear, the device really run out of disk.
Do you know how to improve the TPS, and make writing Aerospike fast ?
Currently, I use sync java client, it seems not very fast, I suppose it is not Aerospike ( or sorted map) limitation, since the sync client need waiting all this write operation completes then start next write operation. How can I make it fast?
- Use multiple threads to write.
- Use async client.
- Use more processes to write
which is best one within above 3 solutions ? Or any aerospike server configuration ?
I would start by increasing the number of threads writing to the server. If you reach a limit try increasing the number of client machines.
Newer versions will also attempt to choose optimal values for service-threads
, transaction-queues
, and transaction-threads-per-queue
. You would need to upgrade the server and remove these configurations.
Typically using a raw device, which bypasses the file system, is faster than using files. Not sure why you are using so many files, depending on your reasons, you may want to partition the device. Also be sure the io scheduler is set to no-op if you are using SSD.