How to increase evict speed to avoid stop writes happing?


#1

Workstation: 4 Cores, SAMSUNG 830 EVO SSD 128G, 16G RAM, ubuntu 14.04

I our case, we need constantly write new objects into the database. I simulated 20k tps write workload with java client and with 10 minutes TTL. Sever gets slowly run out of disk and memory.

Dynamic Configuration
asinfo -v "set-config:context=namespace;id=test;defrag-lwm-pct=10"
asinfo -v "set-config:context=namespace;id=test;high-water-disk-pct=30"
asinfo -v "set-config:context=namespace;id=test;high-water-memory-pct=10"
asinfo -v "set-config:context=namespace;id=test;defrag-sleep=10"
asinfo -v "set-config:context=service;nsup-period=10"
asinfo -v "set-config:context=service;nsup-delete-sleep=10"
asinfo -v "set-config:context=namespace;id=test;evict-tenths-pct=900"
asinfo -v "set-config:context=namespace;id=test;memory-size=2368709120"

Live Configuration
$ asinfo -v "namespace/test" | sed 's/;/\n/g'                       
type=device
objects=15507365
sub-objects=0
master-objects=15507365
master-sub-objects=0
prole-objects=0
prole-sub-objects=0
expired-objects=16382369
evicted-objects=183657586
set-deleted-objects=0
nsup-cycle-duration=1052
nsup-cycle-sleep-pct=93
used-bytes-memory=992471360
data-used-bytes-memory=0
index-used-bytes-memory=992471360
sindex-used-bytes-memory=0
free-pct-memory=58
max-void-time=194592967
non-expirable-objects=0
current-time=194592367
stop-writes=false
hwm-breached=true
available-bin-names=32765
used-bytes-disk=3969885440
free-pct-disk=63
available_pct=34
cache-read-pct=26
memory-size=2368709120
high-water-disk-pct=30
high-water-memory-pct=10
evict-tenths-pct=990
stop-writes-pct=91
cold-start-evict-ttl=4294967295
repl-factor=1
default-ttl=600
max-ttl=0
conflict-resolution-policy=generation
single-bin=false
ldt-enabled=false
ldt-page-size=8192
enable-xdr=false
sets-enable-xdr=true
ns-forward-xdr-writes=false
allow-nonxdr-writes=true
allow-xdr-writes=true
disallow-null-setname=false
total-bytes-memory=2368709120
read-consistency-level-override=off
write-commit-level-override=off
migrate-tx-partitions-initial=0
migrate-tx-partitions-remaining=0
migrate-rx-partitions-initial=0
migrate-rx-partitions-remaining=0
migrate-tx-partitions-imbalance=0
total-bytes-disk=10737418240
defrag-lwm-pct=10
defrag-queue-min=1
defrag-sleep=10
defrag-startup-minimum=10
flush-max-ms=1000
fsync-max-sec=0
max-write-cache=67108864
min-avail-pct=5
post-write-queue=256
data-in-memory=false
file=/var/lib/aerospike/data/test.dat
filesize=10737418240
writethreads=1
writecache=67108864
obj-size-hist-max=100

$ asinfo -v "get-config:" | sed 's/;/\n/g'   
transaction-queues=4
transaction-threads-per-queue=4
transaction-duplicate-threads=0
transaction-pending-limit=20
migrate-threads=1
migrate-xmit-priority=5
migrate-xmit-sleep=5000
migrate-read-priority=5
migrate-read-sleep=5000
migrate-xmit-hwm=10
migrate-xmit-lwm=5
migrate-max-num-incoming=256
migrate-rx-lifetime-ms=60000
proto-fd-max=15000
proto-fd-idle-ms=60000
proto-slow-netio-sleep-ms=1
transaction-retry-ms=1000
transaction-max-ms=1000
transaction-repeatable-read=false
ticker-interval=10
log-local-time=false
microbenchmarks=false
storage-benchmarks=false
ldt-benchmarks=false
scan-max-active=100
scan-max-done=100
scan-max-udf-transactions=32
scan-threads=4
batch-index-threads=4
batch-threads=4
batch-max-requests=5000
batch-max-buffers-per-queue=255
batch-max-unused-buffers=256
batch-priority=200
nsup-delete-sleep=10
nsup-period=10
nsup-startup-evict=true
paxos-retransmit-period=5
paxos-single-replica-limit=1
paxos-max-cluster-size=32
paxos-protocol=v3
paxos-recovery-policy=manual
write-duplicate-resolution-disable=true
respond-client-on-master-completion=false
replication-fire-and-forget=false
info-threads=16
allow-inline-transactions=true
use-queue-per-device=false
snub-nodes=false
prole-extra-ttl=0
max-msgs-per-type=-1
service-threads=4
fabric-workers=16
pidfile=/var/run/aerospike/asd.pid
memory-accounting=false
udf-runtime-gmax-memory=18446744073709551615
udf-runtime-max-memory=18446744073709551615
sindex-builder-threads=4
sindex-data-max-memory=ULONG_MAX
query-threads=6
query-worker-threads=15
query-priority=10
query-in-transaction-thread=0
query-req-in-query-thread=0
query-req-max-inflight=100
query-bufpool-size=256
query-batch-size=100
query-priority-sleep-us=1
query-short-q-max-size=500
query-long-q-max-size=500
query-rec-count-bound=18446744073709551615
query-threshold=10
query-untracked-time-ms=1000
query-pre-reserve-partitions=false
service-address=0.0.0.0
service-port=3000
mesh-seed-address-port=10.32.51.38:3002
reuse-address=true
fabric-port=3001
fabric-keepalive-enabled=true
fabric-keepalive-time=1
fabric-keepalive-intvl=1
fabric-keepalive-probes=10
network-info-port=3003
heartbeat-mode=mesh
heartbeat-protocol=v2
heartbeat-address=10.32.51.9
heartbeat-port=3002
heartbeat-interval=250
heartbeat-timeout=10
enable-security=false
privilege-refresh-period=300
report-authentication-sinks=0
report-data-op-sinks=0
report-sys-admin-sinks=0
report-user-admin-sinks=0
report-violation-sinks=0
syslog-local=-1
enable-xdr=false
xdr-namedpipe-path=NULL
forward-xdr-writes=false
xdr-delete-shipping-enabled=true
xdr-nsup-deletes-enabled=false
stop-writes-noxdr=false
reads-hist-track-back=1800
reads-hist-track-slice=10
reads-hist-track-thresholds=1,8,64
writes_master-hist-track-back=1800
writes_master-hist-track-slice=10
writes_master-hist-track-thresholds=1,8,64
proxy-hist-track-back=1800
proxy-hist-track-slice=10
proxy-hist-track-thresholds=1,8,64
udf-hist-track-back=1800
udf-hist-track-slice=10
udf-hist-track-thresholds=1,8,64
query-hist-track-back=1800
query-hist-track-slice=10
query-hist-track-thresholds=1,8,64
query_rec_count-hist-track-back=1800
query_rec_count-hist-track-slice=10
query_rec_count-hist-track-thresholds=1,8,64

#2


#4

:astonished:

I think you would be best off by reverting to the server defaults, these setting are likely distant from any goals you may have here. You can find a description of these configurations here: http://www.aerospike.com/docs/reference/configuration/.

The main problem here is your defrag-lwm-pct. At 10% an individual wblock needs to deplete below 10% full to be eligible for defrag and it seems this isn’t happening. The default is 50% which is a more aggressive setting. Assuming this resolves your present issue, I would strongly suggest reverting the rest of these settings to defaults.

  1. What problem were you having with the standard configuration that led you here?
  2. If you have a 10 min TTL you probably do not need eviction. Expiration will handle removing of data. You probably will want nsup running more frequently so begin by setting nsup-period to 10 as you have.

#5

I stopped server and delete data file. restart it from scratch, left every thing to be default. SW still happens.


#6


#7

Try reducing nsup-period to 1.

Then try setting nsup-delete-sleep to 0 (may be problem in production).

Then try setting defrag-sleep to 0.

What is the size of your records?

Do you write once and never update/overwrite?


#8

I did try following configuration one by one.

nsup-period = 1
nsup-delete-sleep = 1   or 10 
defrag-sleep = 1

Yes. almost never overwrite data (99% percent of time).

// naive scala code
val array = for {
    i < -1 to 4000
    j < -1 to 300
} yield f "aaaa$i%05d.bbbbb$j%05d"

var mm = scala.util.Random.shuffle(array)

def pop() = synchronized {
  if (mm.length <= 0) {
    mm = scala.util.Random.shuffle(array)
  }
  val h = mm.head
  mm = mm.tail
  h
}

// create 100 threads to write data and each thread will limited to tps_per_thread

var ts = System.currentTimeMillis() / 1000
ts = ts - ts % 60

val key = new Key("test", "datapoints", f "$pop~$ts%010d")
val bin_id = new Bin("_id", f"$pop~$ts%010d")
val bin_ts = new Bin("ts", ts)
val bin_value = new Bin("v", 1.32)

val tps_target = 20000
val threads = 100
val tps_per_thread = tps_target / threads + 1


try {
  client.put(writePolicy, key, bin_id, bin_ts, bin_value)
    //            } catch {
    //              case e: Throwable => println("Got some error: " + e)
} finally {
  cum += 1
  if (cum >= tps_per_thread) {
    var now = System.currentTimeMillis()
    // sleep here
    if (System.currentTimeMillis() - start < 1000) {
      Thread.sleep(1000 - System.currentTimeMillis() + start)
    }
    start = System.currentTimeMillis()
    ts = start / 1000
    ts = ts - ts % 60
    cum = 0
    now = System.currentTimeMillis()
  }
}

#9

What was the observation with these settings?

Could you provide the output of asadm -e info at the time of a problem?


#10

asadm -e info

at the time of problem, read is also stopped ?


#11

on another machine, I’m trying to not expire objects. keep using the same objects pool (1.2 Million in test). most of the time the server works great. But after I set write_block_size to 128K, Avail % in AMC becomes more volatilte.

using the same objects pool. update instead of insert. it seems stable. write_block_size = 128K


#12

But still need to configure HMW,LMW, … carefully. One misconfiguration will break the balance …

e.g.: Avail % is volatile. Need to much attention on configuration correctness.


#13

Looks like this wasn’t targeting one of you nodes.

You would either need to run asadm on the a node or specify the node’s address with the ‘-h’ option.

asadm -h AEROSPIKE_NODE_ADDRESS -e info