Aerospike in-memory DB uses much memory than expected


#1

Hello, I have Aerospike data base configured to keep data in memory. My cluster contains from 5 servers and has 2 name spaces.

First name space: memory-size 4G high-water-memory-pct 80

Second name space memory-size 17G high-water-memory-pct 80

So I expect to have occupied ~ 17Gb RAM.

As fast Aerospike 21.5Gb and this is highly unexpected.

P.s. Cluster evicted data by hwm policy, because capacity to enough to match provided TTL.


#2

Have you read http://www.aerospike.com/docs/operations/plan/capacity?

Also a user, @GeertJohan, has contributed a tool for capacity planning discussed here:


#3

Thank you for the reply. Will check it.

I forgot to mention, if I stop instance and start it back, memory usage drops to 19Gb

Also based on info data memory usage is: ns1: 3.189 GB ns2: 12.737 GB

So I guess that extra 2,5Gb are taken by “dead” records after they were evicted. The question is how to force Aerospike to release this space with out data base stop/start procedure.


#4

Can you share your namespace configuration stanzas?


#5
namespace ns1 {

    replication-factor 2
    memory-size 4G
    high-water-memory-pct 80
    stop-writes-pct 99
    default-ttl 2d

    migrate-sleep 0
    evict-tenths-pct 10
    evict-hist-buckets 50000

    storage-engine memory
}

namespace ns2 {

    replication-factor 2
    memory-size 17G
    high-water-memory-pct 75
    stop-writes-pct 99
    default-ttl 10d

    migrate-sleep 0
    evict-tenths-pct 10
    evict-hist-buckets 50000

    storage-engine memory
}

#6

Did you build any secondary indexes? Evicted records for data-in-memory will immediately release memory back to the process (for CE, primary index is also stored in RAM). Total RAM consumed by Aerospike process is about ~1GB for process itself, RAM for PI (64 bytes * number of records on that node, master or replica), RAM for data plus overhead per record which is explained in the linux capacity planning page (calculate RAM for the cluster, divide by number of nodes for each node usage). Then additional RAM for optional secondary indexes. You can see all SIs that you have in AQL…

$aql

aql> show indexes

For calculating memory consumed by SIs, you will need cardinality and number of records indexed by each SI. You can get that by (for example, namespace is ns1, index name is my_index1:

$asinfo -v ‘sindex/ns1/my_index1’

In the output, keys= bin cardinality, entries = number of records indexed …

So look at you number of records, replication factor = 2 in your case, size of data in your records and number of bins in your records and calculate your full RAM usage for the cluster, divide by number of nodes (assuming identically sized namespaces on all nodes) and get each node usage.