Aerospike memory usage spikes up linearly

Hi,

I have a cloud VM with a network disk attached that’s acting as a single node Aerospike server. I ingest small data in it throughout the day (sorted maps, lists, key-value, etc.).

I also use the same server to serve an API response - the API traffic currently is very small.

I find that the server memory usage creeps slowly and eventually have to restart the server to render it useful. Here’s my server configuration:

namespace crawler {
        replication-factor 2
        memory-size 1G
        default-ttl 0 #5 days, use 0 to never expire/evict.
        nsup-period 120

        # To use file storage backing, comment out the line above and use the
        # following lines instead.

        storage-engine device {
                device /dev/sda
                write-block-size 8M
                # data-in-memory true # Store data in memory in addition to file.
        }
}

Here’s the monitoring data from past 14 days.

The low memory periods are when the server has crashed for a while before I was able to get to it.

What can be the reason?

Which version of Aerospike are you using?

It may be useful to scrutinize the logs as well to see if this is coming from an increase in heap and if it correlates with something obvious (presence of secondary indices for example, etc…).

Hard to otherwise guess. There have been occasional memory leaks of different sorts that were found and addressed (would check the release notes for details on known memory leaks that were addressed).

Thanks @meher.

Is there anything in particular I should be looking at in Aerospike logs?

Most of the logs kind of look like the following:

aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:168) NODE-ID bb9020014ac4202 CLUSTER-SIZE 1
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:250)    cluster-clock: skew-ms 0
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:273)    system: total-cpu-pct 14 user-cpu-pct 12 kernel-cpu-pct 2 free-mem-kbytes 1479512 free-mem-pct 72
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:296)    process: cpu-pct 1 threads (9,60,51,47) heap-kbytes (1194851,1196380,1265664) heap-efficiency-pct 94.4
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:307)    in-progress: info-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:330)    fds: proto (7,110,103) heartbeat (0,0,0) fabric (0,0,0)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:338)    heartbeat-received: self 0 foreign 0
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:368)    fabric-bytes-per-second: bulk (0,0) ctrl (0,0) meta (0,0) rw (0,0)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:418)    batch-index: batches (226,0,0) delays 0
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:321) histogram dump: batch-index (226 total) msec
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:340)  (00: 0000000209) (01: 0000000016) (02: 0000000001)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:431) {crawler} objects: all 3708063 master 3708063 prole 0 non-replica 0
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:493) {crawler} migrations: complete
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:522) {crawler} memory-usage: total-bytes 237316032 index-bytes 237316032 set-index-bytes 0 sindex-bytes 0 used-pct 22.10
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:594) {crawler} device-usage: used-bytes 6369981504 avail-pct 25 cache-read-pct 2.13
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:648) {crawler} client: tsvc (0,0) proxy (0,0,0) read (4179,0,0,58,0) write (21933,0,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:770) {crawler} batch-sub: tsvc (0,0) proxy (0,0,0) read (0,0,0,20499,0)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:825) {crawler} scan: basic (2,0,0) aggr (0,0,0) udf-bg (1,0,0) ops-bg (0,0,0)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (ticker.c:881) {crawler} udf-sub: tsvc (0,0) udf (0,26,0,0) lang (0,0,0,26)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:321) histogram dump: {crawler}-read (4237 total) msec
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:331)  (00: 0000000902) (01: 0000003250) (02: 0000000047) (03: 0000000017)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:340)  (04: 0000000021)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:321) histogram dump: {crawler}-write (21933 total) msec
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:331)  (00: 0000021789) (01: 0000000095) (02: 0000000008) (03: 0000000008)
aerospikedb_1  | Sep 25 2021 05:00:28 GMT: INFO (info): (hist.c:331)  (04: 0000000011) (05: 0000000018) (06: 0000000003) (08: 0000000001)

Also, please note:

  1. I haven’t really set any primary/secondary keys for my sets (I am new to Aerospike and couldn’t really figure out a way to do so)
  2. I am not too sure how the data is being stored - primary storage has been specified as flash in the config, but I am not sure if memory is being utilized for caching. I was expecting to something along the lines of Redis-on-Flash whereby hot values (recently accessed) as well as all keys are stored in memory, cold values are stored on flash.

Could it be that my memory settings are wrong? That is my in-memory cache is growing and growing beyond machine memory? For reference, my machine has 1 vCPU and 2GB RAM.

Thanks!

With the default 256 post-write-queue that’s 8MiB*256=2GiB. You need to turn that down. Do you have read-page-cache on? What’s your max-write-cache? Also Aerospike allocates ram in 1GiB slabs… 2GB is really quite small, especially if you want 8MiB objects (massive). Aerospike will need some tuning to work on machines that small.

Good catch @Albot ! Yes, that would do it and let it increase proportionally to writes! For a small namespace that can indeed make a big difference. So, would definitely reduce the post-write-queue significantly (based on how much extra RAM is available) and your other suggestions make sense. Having memory-size of 1GiB does indicate this is a very small namespace.

Well, never mind, this gives it away from the shared logs (1.4GiB is 72% of the system RAM):

free-mem-kbytes 1479512 free-mem-pct 72

1 Like

Thanks @Albot and @meher for your responses.

I have a 2GB RAM (I may look into upgrading the instance to one with higher RAM in future). So, will setting max-write-cache to 1GB solve all the problems? That is, will max-write-cache value superceed the default value of 256 of post-write-queue? Or should I set it explicitly to 128?

Also, as you can see in my original post, I have explicitly set memory-size to 1GB. So, why does cache size blow past it?

Thanks!

Good question. The memory-size only tracks the index-size, sindex-size if used, data-in-memory if enabled, set-index if used, in order to enforce evictions (if configured) and stop-writes.

But there are other memory allocations that happen which are not tracked per the memory-size limit (like xdr transaction queues, write-queue cache (max-write-cache), post-write-queue, other fabric buffers… Those are typically fairly small, but in this case, with such a low memory system and a large write-block-size, it does cause this problem.

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.