We are seeing the memory of the “asd” process rising slowly over time in production and eventually we have to restart the Aerospike server.
We tried to isolate the issue and it looks like one of our UDFs is causing a leak. In order to reproduce the issue, we created a simplified version of our UDF and ran it in our test server.
Server: Community Edition - 4.8.0.5
Test UDF
local function createRecord(rec, requestId)
rec["REQS_LIST"] = list {requestId}
record.set_ttl(rec, 300)
aerospike:create(rec)
end
local function updateRecord(rec)
record.set_ttl(rec, 300)
aerospike:update(rec)
end
function addRequest(rec, requestId)
if (not aerospike:exists(rec)) then
createRecord(rec, requestId)
else
local requestsList = rec["REQS_LIST"]
list.append(requestsList, requestId)
rec["REQS_LIST"] = requestsList
updateRecord(rec)
end
end
function removeRequest(rec)
local requestsList = rec["REQS_LIST"]
list.remove(requestsList, 1)
rec["REQS_LIST"] = requestsList
updateRecord(rec)
end
Our test script calls “addRequest”, waits for 0-500 ms randomly and calls “removeRequest” repeatedly for 1000 different keys (1000 Aerospike entries) in a loop, in total (add and then remove) 17million times, 220 per second. The memory increased from 39M to 1.1G in ~21 hours.
Start and End log from “top” command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Wed Jun 10 14:42:01 UTC 2020 - 15186 root 20 0 2266m 39m 4836 S 2.0 0.1 0:00.22 asd
Wed Jun 10 14:43:01 UTC 2020 - 15186 root 20 0 2293m 42m 5048 S 4.0 0.1 0:01.41 asd
Wed Jun 10 14:44:01 UTC 2020 - 15186 root 20 0 2301m 43m 5112 S 4.0 0.1 0:03.47 asd
...
Thu Jun 11 11:46:01 UTC 2020 - 15186 root 20 0 3363m 1.1g 5176 S 2.0 3.4 35:31.77 asd
Thu Jun 11 11:47:01 UTC 2020 - 15186 root 20 0 3363m 1.1g 5176 S 4.0 3.4 35:33.38 asd
Thu Jun 11 11:48:01 UTC 2020 - 15186 root 20 0 3363m 1.1g 5176 S 2.0 3.4 35:35.02 asd
Aerospike summary shows the memory used at 1% from start to end.
Please let me know if you need any info.