Aerospike RAM consumption keeps increasing and crashes (OOM) (AER-6129)

I have a single node Aerospike running on Docker. I do write once at startup and afterwards I do only read operation using Golang Aerospike client.
The Aerospike keeps consuming memory and crashes after a while when there is no memory available (OOM).

Following is my aerospike.conf-

service {
	user root
	group root
	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
	pidfile /var/run/aerospike/asd.pid
	service-threads 4
	transaction-queues 4
	transaction-threads-per-queue 4
	proto-fd-max 15000
}

logging {
	file /var/log/aerospike/aerospike.log {
		context any warning
	}

	console {
		context any warning 
	}
}

network {
	service {
		address eth0
		port 3000
	}

	heartbeat {
		address eth0
		mode mesh
		port 3002
		interval 150
		timeout 10
	}

	fabric {
		address eth0
		port 3001
	}

	info {
		port 3003
	}
}

namespace my_namespace {
	replication-factor 1
	memory-size 1G
	default-ttl 5d # 5 days, use 0 to never expire/evict.

	# storage-engine memory

	storage-engine device {
		file /opt/aerospike/data/my_namespace.dat
		filesize 5G
		write-block-size 128K
		data-in-memory true
	}
}

The RAM usage on Aerospike Console shows negligible usage. But on htop I can see that it keeps on increasing.

Aerospike Console-

htop-

I read the following discussion but couldn’t find a solution- High memory usage on Kubernetes GKE using helm chart

Update-
I used direct-files as follows

storage-engine device {
    file /opt/aerospike/data/my_namespace.dat
    filesize 5G
    write-block-size 128K
    data-in-memory true
    direct-files true
}

but still the problem persists.

Output of

asadm -e "summary"
asadm -e "show stat"
$ asadm -e "summary"
Seed:        [('aerospike', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
Cluster
=======

   1.   Server Version     :  C-4.6.0.2
   2.   OS Version         :
   3.   Cluster Size       :  1
   4.   Devices            :  Total 2, per-node 2
   5.   Memory             :  Total 5.000 GB, 0.80% used (40.960 MB), 99.20% available (4.960 GB)
   6.   Disk               :  Total 10.000 GB, 0.00% used (3.422 KB), 99.00% available contiguous space (9.900 GB)
   7.   Usage (Unique Data):  0.000 B  in-memory, 2.953 KB on-disk
   8.   Active Namespaces  :  1 of 2
   9.   Features           :  KVS, Query, SINDEX


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespaces~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace            Devices                      Memory                       Disk   Replication    Rack     Master           Usage           Usage
        .   (Total,Per-Node)        (Total,Used%,Avail%)       (Total,Used%,Avail%)        Factor   Aware    Objects   (Unique-Data)   (Unique-Data)
        .                  .                           .                          .             .       .          .       In-Memory         On-Disk
go_rtb      (1, 1)             (4.000 GB, 1.00, 99.00)     (5.000 GB, 0.00, 99.00)              1   False   12.000          0.000 B         2.953 KB
test        (1, 1)             (1.000 GB, 0.00, 100.00)    (5.000 GB, 0.00, 99.00)              1   False    0.000          0.000 B         0.000 B
Number of rows: 2
$ asadm -e "show stat"

Output

Attaching output of just the one namespace that is used. (The other is test namespace).

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7976        7101         139           0         736         632
Swap:             0           0           0
$ top
top - 14:31:26 up 19 days, 22:15,  1 user,  load average: 0.04, 0.06, 0.07
Tasks: 124 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.2 us,  0.7 sy,  0.0 ni, 98.1 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  8168096 total,   139452 free,  7275236 used,   753408 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   646200 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 7891 root      20   0  9.877g 6.542g   6992 S   3.0 84.0  17:06.03 asd
 8685 root      20   0 1602700  21160   4980 S   2.6  0.3  18:01.19 main
 1024 root      20   0 2437512 152360  24812 S   0.7  1.9 233:31.24 dockerd
 7787 root      20   0  118692  37020  13464 S   0.7  0.5   0:56.69 amc
14643 root      20   0  105872   6868   5776 S   0.7  0.1   0:00.07 sshd

The above outputs are from same time when htop says memory usage is 84%

So you have one node, that node only has 12 records and it is using 6.5 GiB RAM?

Starting a single node asd within a docker container locally with 0 records is showing only 56 MiB RAM used. Is there anything else going on in your scenario?

Yes, I’ve a single node with 4vCPU and 8GB RAM. The setup is running on Docker Swarm and the client is Golang.

Write is performed only once at the beginning and then it is just read operation at approximately 2000 TPS.

I’ve found the problem to the ever increasing RAM-

I have a Secondary Index on a bin called my_bin which is a List of strings. I first Filter using this secondary index and then apply Predicate Filtering on it.

The following Golang code is used to do the above mentioned operation.

stmt := aerospike.NewStatement(repo.as.Namespace, "seta")
if err := stmt.SetFilter(
    aerospike.NewContainsFilter("my_bin", aerospike.ICT_LIST, pm.ID),
); err != nil {
    return nil, err
}
if err := stmt.SetPredExp(
    aerospike.NewPredExpStringValue("Michael"),
    aerospike.NewPredExpStringVar("name_list"),
    aerospike.NewPredExpStringEqual(),
    aerospike.NewPredExpListBin("name"),
    aerospike.NewPredExpListIterateOr("name_list"),

    aerospike.NewPredExpStringValue("Developer"),
    aerospike.NewPredExpStringVar("job_list"),
    aerospike.NewPredExpStringEqual(),
    aerospike.NewPredExpListBin("job"),
    aerospike.NewPredExpListIterateOr("job_list"),

    aerospike.NewPredExpStringValue("Golang"),
    aerospike.NewPredExpStringVar("lang_list"),
    aerospike.NewPredExpStringEqual(),
    aerospike.NewPredExpListBin("lang"),
    aerospike.NewPredExpListIterateOr("lang_list"),

    aerospike.NewPredExpStringValue("Scranton"),
    aerospike.NewPredExpStringVar("ctr_list"),
    aerospike.NewPredExpStringEqual(),
    aerospike.NewPredExpListBin("country"),
    aerospike.NewPredExpListIterateAnd("ctr_list"),

    aerospike.NewPredExpStringValue("Backend"),
    aerospike.NewPredExpStringVar("intrst_list"),
    aerospike.NewPredExpStringEqual(),
    aerospike.NewPredExpListBin("interest"),
    aerospike.NewPredExpListIterateAnd("intrst_list"),
    aerospike.NewPredExpAnd(5),
); err != nil {
    return nil, err
}

rs, err := repo.as.AsClient().Query(nil, stmt)
if err != nil {
    return nil, err
}

When I disable just the predicate filtering the RAM usage stops increasing and remains stable. But when I enable the predicate filtering RAM keeps increasing and crashes eventually.

Is there something wrong with my above pasted code or some memory leak bug in Aerospike?

1 Like

I want to keep using Predicate Filtering as it simplifies a lot of things.

Thank you for the example code, I’ve been able to reproduce the leak and have filed an internal bug report (AER-6129).

1 Like

Thank you for your help so far, I hope a fix is patched soon as I’m really in a tight deadline and we are relying heavily on Aerospike for our needs.

We have identified the leak keep an eye on the release notes for AER-6129. It may not be in the upcoming 4.7.0.x release since the build process had already begun yesterday and this isn’t a regression. So the fix will likely be released in a hot-fix a within a few days after 4.7.0.x release.

Quick update, because we are incorporating predexp capabilities into more APIs in Aerospike 4.7.0.x, we have decided to interrupt the release process to include a patch for this issue. The current plan is to ship 4.7.0.2 by end of day this coming Monday (30 SEP). After this release, hotfix patches will begin to appear for some older versions which will include the 4.6 versions as QA completes regression testing for each.

Also, the hotfix release for 4.6.0, 4.6.0.5 also includes a fix for this memory leak.

Thanks again for reporting this issue to us.

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.