The interval to flush data into Disk

by Hanson » Wed Jul 30, 2014 10:41 pm

I’m running pure Write traffic (30K TPS, 1KB record size) with Aerospike 3.3.8 In-Memory + Disk Persistence. The “iostat” shows ~15 seconds of ~120MB/s disk writing for every interval of 60 seconds, and 0 MB/s for the rest of 60-15 = 45 seconds.

Any parameters can be tuned to let the disk activity smoothly? (spread out the disk writing as 15*120/60 = 30MB/s, instead of spikes as 120MB/s)

The purpose:

  • Reduce the data loss in case of power lost, trade-off the performance and Durability of ACID.
  • On Cloud to reduce the disk I/O impact for VMs each other in case of two VMs having block storage on the same physical disk.

Speeding up defragmentation should help spread out the disk io.

Try setting these to a value of 1:

defrag-period=1

defrag-queue-priority=1

These changes can be made permanent in the aerospike.conf file.

Or you can dynamically test it using asinfo:

asinfo -v ‘set-config:context=namespace;id=test;defrag-period=1’ asinfo -v ‘set-config:context=service;id=test;defrag-queue-priority=1’

More info on these settings and others can be found at :

http://www.aerospike.com/docs/reference/configuration/

There is no parameter “defrag-period” in Aerospike 3.3.8. Only found followings: defrag-queue-hwm=500;defrag-queue-lwm=1;defrag-queue-escape=10;defrag-queue-priority=1

Here are the full list:

===================================
[root@localhost]# asinfo -v 'get-config:context=service'
requested value get-config:context=service
value is transaction-queues=4;transaction-threads-per-queue=4;transaction-duplicate-threads=0;transaction-pending-limit=20;
migrate-threads=1;migrate-priority=40;migrate-xmit-priority=40;migrate-xmit-sleep=500;migrate-read-priority=10;migrate-read-sleep=500;
migrate-xmit-hwm=10;migrate-xmit-lwm=5;migrate-max-num-incoming=256;migrate-rx-lifetime-ms=60000;proto-fd-max=15000;
proto-fd-idle-ms=60000;transaction-retry-ms=1000;transaction-max-ms=1000;transaction-repeatable-read=false;dump-message-above-size=134217728;ticker-interval=10;microbenchmarks=false;storage-benchmarks=false;scan-priority=200;scan-sleep=1;batch-threads=4;
batch-max-requests=5000;batch-priority=200;nsup-period=120;nsup-queue-hwm=500;nsup-queue-lwm=1;nsup-queue-escape=10;defrag-queue-hwm=500;defrag-queue-lwm=1;defrag-queue-escape=10;defrag-queue-priority=1;nsup-auto-hwm-pct=15;nsup-startup-evict=true;
paxos-retransmit-period=5;paxos-single-replica-limit=1;paxos-max-cluster-size=32;paxos-protocol=v3;paxos-recovery-policy=manual;
write-duplicate-resolution-disable=false;respond-client-on-master-completion=false;replication-fire-and-forget=false;info-threads=16;
allow-inline-transactions=true;use-queue-per-device=false;snub-nodes=false;fb-health-msg-per-burst=0;fb-health-msg-timeout=200;
fb-health-good-pct=50;fb-health-bad-pct=0;auto-dun=false;auto-undun=false;prole-extra-ttl=0;max-msgs-per-type=-1;
pidfile=/var/run/aerospike/asd.pid;memory-accounting=false;udf-runtime-gmax-memory=18446744073709551615;
udf-runtime-max-memory=18446744073709551615;sindex-populator-scan-priority=3;sindex-data-max-memory=18446744073709551615;
query-threads=6;query-worker-threads=15;query-priority=10;query-in-transaction-thread=0;query-req-in-query-thread=0;
query-req-max-inflight=100;query-bufpool-size=256;query-batch-size=100;query-sleep=1;query-job-tracking=false;
query-short-q-max-size=500;query-long-q-max-size=500;query-rec-count-bound=4294967295;query-threshold=10

by Hanson » Tue Aug 05, 2014 11:58 pm

The problem is: no disk writing during that 45 seconds, that means all the data changed in memory for such a long time will be lost if power down. That is a large amount of data when with high Insert traffic: 30K TPS * 45s = 1350K records! Could the data loss be minimized?