What is the Difference Between Storage Engine Memory and Data-In-Memory?


#1

Synopsis

What is the difference between a namespace configured for data-in-memory true and one configured for storage-engine memory?

Storage Engine

Every namespace can be identified by the storage-engine it is configured with. A namespace that is configured to use a device is known as a persisted namespace, because data is either stored on disk or it is stored in a file. In either case, data survives even if the server stops unexpectedly.

In this example, data is persisted on the file /opt/aerospike/data/bar.dat or on a disk. Note that it only stores indexes in memory. All data is stored in the file.

The following example illustrates a persisted namespace on a device:

namespace bar {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

       storage-engine device {
               device /dev/sdb
               data-in-memory false 
     }
}

For a file-backed namespace, configuration is as follows:

storage-engine device {
               file /opt/aerospike/data/bar.dat
               filesize 16G
               data-in-memory false 
     }

A namespace that is configured to use storage-engine memory does not store data on disk. If the server fails, the data stored in memory is lost.

The following example shows a namespace configured to use storage-engine memory:

namespace test {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory
}

Data-in-Memory

A namespace that is configured for data-in-memory stores all data in memory, as you would expect. The difference between data-in-memory and storage-engine memory is that a namespace with data-in-memory set to true can be configured to persist data on disk and will keep its index in shared memory during a normal asd process restart (as of server version 3.15.1.3). This latest feature will also eliminate the risk of deleted records being re-indexed during a restart.

The following example illustrates this point:

namespace bar {
        replication-factor 2
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.
        storage-engine device {
               file /opt/aerospike/data/bar.dat
               filesize 16G
               data-in-memory true # Store data in memory in addition to file.
       }
}

We see that this namespace stores data in the file /opt/aerospike/data/bar.dat, and also holds the same data in memory.

In terms of system load, the difference between the two different configurations should be negligible if sized correctly. In some cases, though, depending on the system configuration and especially the swappiness and memory related tuning, persisted namespaces with data-in-memory set to true may experience occasional spikes in load and latencies corresponding to the system managing the file cahce involved in storing the data on a file. Finally, namespaces in memory without persistence do not have a limit to the record size (whereas persisted namespaces limit the record sizes to the configured write-block-size.

More details on the different configuration recipes can be found on the following page: http://www.aerospike.com/docs/operations/configure/namespace/storage/