The interval to flush changed data from memory into disk?


#1

by Hanson » Thu Jul 24, 2014 8:02 pm

I’m using Aerospike 3.3.8 In-Memory with disk Persistence.

I tried to insert a new record, and checked the persistence data file, but found the timestamp of it has not been changed after I have inserted that record for a long time.

Seems Aerospike will not flush changed data from memory into disk if the buffer is not full.

Shouldn’t also flush changed data from memory into disk per time interval? e.g.: either buffer is full or interval > 1 second since last flush.

The changed data can be persistent if I use “service aerospike restart” to restart the Aerospike, but lost if “kill -9 ”.


How Aerospike write data into disk
#2

by bbulkow » Thu Jul 31, 2014 2:12 pm

Aerospike flushes based on write volume. By default, writes are flushed every 128K. In most reasonable environments, this will be less than 10 milliseconds of window. Since Aerospike does not use the file system by default (direct device access), our use of O_SYNC against the raw device means writes are written through immediately instead of being cached by the filesystem or on a page pasis.

We catch many of the signals, including SIGTERM (-9) and flush pending writes, so the system does not “lose data” if terminated with a kill -9. In the open source, you can see our handling of sigterm on signal.c line 61. The lock in question is held around line 440 of as.c, which releases the deadlock on the main thread, and then does the process shutdown routines — including as_storage_shutdown, which flushes pending buffers.

Of course, as a clustered and distributed system, we expect in production environments you will run with multiple servers. That redundancy greatly improves practical reliability, as well.


#3

by Hanson » Wed Aug 06, 2014 12:48 am

The problem is: the flushing interval is too long (or never if buffer not full?), which get more chance for data loss. I just use “kill -9” to simulate a crash of system (such as power lost on single node or cluster level).

“kill -9” is to issue SIGKILL, instead of SIGTERM. SIGKILL is not able to be caught by signal(): http://stackoverflow.com/questions/3908 … al-handler


#4

by Hanson » Wed Aug 06, 2014 2:45 am

I observed that the timestamp of persistence data file is updated for every 372 seconds with Insert traffic at 1 TPS and 1KB size of record. That means 372KB size of buffer? And never flush data from memory into disk file before buffer full.

[ainet@localhost data]$ ls -l --time-style=full-iso test.dat
-rw------- 1 root root 68719476736 2014-08-06 17:17:57.772677373 +0800 test.dat
[ainet@localhost data]$ date
Wed Aug  6 17:22:47 CST 2014
[ainet@localhost data]$ ls -l --time-style=full-iso test.dat
-rw------- 1 root root 68719476736 2014-08-06 17:17:57.772677373 +0800 test.dat
[ainet@localhost data]$ date
Wed Aug  6 17:23:11 CST 2014
[ainet@localhost data]$ ls -l --time-style=full-iso test.dat
-rw------- 1 root root 68719476736 2014-08-06 17:17:57.772677373 +0800 test.dat
[ainet@localhost data]$ date
Wed Aug  6 17:24:04 CST 2014
[ainet@localhost data]$ ls -l --time-style=full-iso test.dat
-rw------- 1 root root 68719476736 2014-08-06 17:24:10.028676974 +0800 test.dat

#5

by Hanson » Tue Aug 19, 2014 12:56 am

Any parameter to change the flush interval?


#6

by bbulkow » Thu Aug 21, 2014 9:29 am

The buffer size is configurable, the flush interval is not — yet. We will be exposing that soon.

The buffer size is controlled with: http://www.aerospike.com/docs/reference/configuration/ write-block-size on a per-namespace basis.

Aerospike has a variety of overhead - we store the key, various timeout parameters, the vector clock, and round to the closest 128B, so the size of the buffer will be greater than the size of your test data.

Remember that you have replica servers, and by default writes are synchronous. Most of our deployments are for higher write load cases, where there is often more than 100MB/sec of writes. At those rates, with 128KB buffers, the flush interval is very low - and we can use O_SYNC at the device level, so there is no chance of any OS buffering, and it’s up to you to make sure your storage device is not buffering. You must do ALL of this if you are really caring about minimal data loss.


#7

by Hanson » Thu Aug 21, 2014 7:16 pm

Agree with your point that the cluster with data replica and Rack Awareness could overcome this issue. While it is serious for single node in case of server crashed (power lost etc.).

I expect it can flush the changed data from memory into disk when either buffer is full or interval > x seconds since last flush. This is to minimize the chance of data loss for single node.

There is no parameter “write-block-size” in Aerospike 3.3.12:

CODE: SELECT ALL [root@localhost data]# asinfo -v 'get-config:context=namespace;id=test’ requested value get-config:context=namespace;id=test value is sets-enable-xdr=true;memory-size=53687091200;low-water-pct=0;high-water-disk-pct=50; high-water-memory-pct=60;evict-tenths-pct=5;stop-writes-pct=90;cold-start-evict-ttl=4294967295; repl-factor=1;default-ttl=2592000;max-ttl=0;conflict-resolution-policy=generation;allow_versions=false; single-bin=false;enable-xdr=false;disallow-null-setname=false;total-bytes-memory=53687091200; total-bytes-disk=68719476736;defrag-period=1;defrag-max-blocks=4000;defrag-lwm-pct=50; write-smoothing-period=0;defrag-startup-minimum=10;max-write-cache=67108864;min-avail-pct=5; post-write-queue=0;data-in-memory=true;file=/instances/aerospike/data/test.dat;filesize=68719476736; writethreads=1;writecache=67108864;obj-size-hist-max=100

That web page you mentioned is out of date for some parameters.