Hi I am using aerospike-6.1.0.1. I am running into a issue where the write throughput is reduced from 1000 ops/second to 0.8 ops/second (even timeout). When I set the new cluster with 2 nodes and Replication factor 2, it works perfectly fine with 1000 ops/second, but if I restart one node and wait for the migration to get complete the write ops reduces to mere 0.8 ops/second. But read ops are fine, and if I reduce the replication factor from 2 to 1 it works fine.
The issue happens only after I restart one node. Then after remove the whole aerospike setup and installing again, it works fine until restart of a node.
The two nodes are aws i3.2xlarge
log that gets prints (I dont see any error or warning logs):
Apr 26 2023 21:52:05 GMT: INFO (drv_ssd): (drv_ssd.c:1893) {ycsb} /dev/nvme0n1: used-bytes 98496 free-wblocks 14495783 write-q 0 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:163) NODE-ID bb9ae55b9355c06 CLUSTER-SIZE 2
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:234) cluster-clock: skew-ms 0
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:255) system: total-cpu-pct 0 user-cpu-pct 0 kernel-cpu-pct 0 free-mem-kbytes 62056704 free-mem-pct 98 thp-mem-kbytes 0
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:277) process: cpu-pct 1 threads (9,61,104,104) heap-kbytes (1545219,1547368,1784320) heap-efficiency-pct 99.9
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:287) in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0 long-queries 0
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:311) fds: proto (2,133,131) heartbeat (1,7,6) fabric (26,38,12)
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:320) heartbeat-received: self 4 foreign 6980
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:346) fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (0,0)
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:405) {ycsb} objects: all 19 master 9 prole 10 non-replica 0
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:469) {ycsb} migrations: complete
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:495) {ycsb} memory-usage: total-bytes 1216 index-bytes 1216 set-index-bytes 0 sindex-bytes 0 used-pct 0.00
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:564) {ycsb} device-usage: used-bytes 98496 avail-pct 99 cache-read-pct 0.00
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:613) {ycsb} client: tsvc (0,0) proxy (0,0,0) read (0,0,0,0,0) write (8,0,1,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)
Apr 26 2023 21:52:05 GMT: INFO (info): (ticker.c:980) {ycsb} retransmits: migration 87 all-read 0 all-write (0,15) all-delete (0,0) all-udf (0,0) all-batch-sub 0 udf-sub (0,0) ops-sub (0,0)
Apr 26 2023 21:52:05 GMT: INFO (info): (hist.c:320) histogram dump: {ycsb}-write (8 total) msec
Apr 26 2023 21:52:05 GMT: INFO (info): (hist.c:331) (00: 0000000001) (10: 0000000004) (11: 0000000002) (12: 0000000001)
Configuration of aerospike.config:
# Aerospike database configuration file.
service {
cluster-name AerospikeCluster1
service-threads 40
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /home/ubuntu/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
access-address 172.31.19.106
}
heartbeat {
mode mesh
port 3002
mesh-seed-address-port 172.31.24.176 3002
mesh-seed-address-port 172.31.19.106 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 50
}
fabric {
send-threads 8
port 3001
}
info {
port 3003
}
}
namespace ycsb {
replication-factor 2
memory-size 40G
storage-engine device {
device /dev/nvme0n1
write-block-size 128K
}
}