What is the Difference Between Storage Engine Memory and Data-In-Memory?


#1

Synopsis

What is the difference between a namespace configured for data-in-memory true and one configured for storage-engine memory?

Storage Engine

Every namespace can be identified by the storage-engine it is configured with. A namespace that is configured to use a device is known as a persisted namespace, because data is either stored on disk or it is stored in a file. In either case, data survives even if the server stops unexpectedly.

In this example, data is persisted on the file /opt/aerospike/data/bar.dat or on a disk. Note that it only stores indexes in memory. All data is stored in the file.

The following example illustrates a persisted namespace on a device:

namespace bar {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

       storage-engine device {
               device /dev/sdb
               data-in-memory false 
     }
}

For a file-backed namespace, configuration is as follows:

storage-engine device {
               file /opt/aerospike/data/bar.dat
               filesize 16G
               data-in-memory false 
     }

A namespace that is configured to use storage-engine memory does not store data on disk. If the server fails, the data stored in memory is lost.

The following example shows a namespace configured to use storage-engine memory:

namespace test {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory
}

Data-in-Memory

A namespace that is configured for data-in-memory stores all data in memory, as you would expect. The difference between data-in-memory and storage-engine memory is that a namespace with data-in-memory set to true can be configured to persist data on disk and will keep its index in shared memory during a normal asd process restart (as of server version 3.15.1.3). This latest feature will also eliminate the risk of deleted records being re-indexed during a restart.

The following example illustrates this point:

namespace bar {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.
        storage-engine device {
               file /opt/aerospike/data/bar.dat
               filesize 16G
               data-in-memory true # Store data in memory in addition to file.
       }
}

We see that this namespace stores data in the file /opt/aerospike/data/bar.dat, and also holds the same data in memory.

In terms of system load, the difference between the two different configurations should be negligible if sized correctly. In some cases, though, depending on the system configuration and especially the swappiness and memory related tuning, persisted namespaces with data-in-memory set to true may experience occasional spikes in load and latencies corresponding to the system managing the file cahce involved in storing the data on a file. Finally, namespaces in memory without persistence do not have a limit to the record size (whereas persisted namespaces limit the record sizes to the configured write-block-size.

Note

Please be aware that the above configuration (data-in-memory true with disk persistence) should not be used as an alternative if the storage subsystem is not able of handling the write-load by it self. The below examples should help understand this better.

  1. In case of workloads that do not read a record prior to updating it (pure replace or new record creations), if the storage subsytem is not capable of handling the workload (when data-in-memory is set to false), then moving to data-in-memory with the same disk as persistence will not help and will most likely lead to queue too deep issues.

  2. In case of read-update workload, moving to a data-in-memory true configuraion will help as the reads will be from memory rather than from the storage subsystem.

It is always a good practice to benchmark the different configurations under relevant workload prior to making a decision for a production use.

More details on the different configuration recipes can be found on the following page: http://www.aerospike.com/docs/operations/configure/namespace/storage/