Observing high memory usage of Aerospike Server while performing UDF operations via aerospike golang client.
After every execution of the udf function from client, memory keeps on increasing and ultimately the aerospike crashes after memory has exhausted.
Methods already tried
- set “cache-enabled” property of “mod-lua” in config to “false” - didn’t help.
- tried using predicate filtering (deprecated in latest aerospike version) instead of filter expression - didn’t help.
- Tried with an empty udf function as well to eliminate the possibility of memory leak via udf function - didn’t help.
- If I just query data (using client method “Query”) instead of executing udf function to delete them, memory isn’t impacted.
Steps to replicate -
- Use filter expression and filter for a bin value.
- Execute function “ExecuteUDF” of aerospike go client with the above filter while passing udf function to execute - “invalidate” (code mentioned below).
Aerospike Memory stats
Note: all the memory snapshots are taken while there is no external traffic to the aerospike instance except for the script given below which is used for testing this.
Command used - asadm -e “show stat” | grep mem
- Initial details before executing any query from client
memory_data_bytes|0
cluster_is_member |true
system_free_mem_pct |98
high-water-memory-pct |90
memory-size |4294967296
memory_free_pct |99
memory_used_bytes |19226816
memory_used_data_bytes |0
memory_used_index_bytes |19226816
memory_used_sindex_bytes |0
storage-engine.data-in-memory |false
xmem_id |0
- After running below code once (observation - system_free_mem_pct has reduced)
memory_data_bytes|0
cluster_is_member |true
system_free_mem_pct |79
high-water-memory-pct |90
memory-size |4294967296
memory_free_pct |99
memory_used_bytes |19226752
memory_used_data_bytes |0
memory_used_index_bytes |19226752
memory_used_sindex_bytes |0
storage-engine.data-in-memory |false
xmem_id |0
- After running below code once again (observation - system_free_mem_pct has reduced further)
memory_data_bytes|0
cluster_is_member |true
system_free_mem_pct |60
high-water-memory-pct |90
memory-size |4294967296
memory_free_pct |99
memory_used_bytes |19226752
memory_used_data_bytes |0
memory_used_index_bytes |19226752
memory_used_sindex_bytes |0
storage-engine.data-in-memory |false
xmem_id |0
- After restarting aerospike (its back to the initial state mentioned in #1)
memory_data_bytes|0
cluster_is_member |true
system_free_mem_pct |98
high-water-memory-pct |90
memory-size |4294967296
memory_free_pct |99
memory_used_bytes |19226816
memory_used_data_bytes |0
memory_used_index_bytes |19226816
memory_used_sindex_bytes |0
storage-engine.data-in-memory |false
xmem_id |0
Client code
Objective of this code - delete record where “abc” = “pqr9999” (“abc” is bin name here)
package main
import (
"fmt"
aero "github.com/aerospike/aerospike-client-go"
"time"
)
func getClient() (*aero.Client, error) {
policy := aero.ClientPolicy{
Timeout: 20 * time.Second,
IdleTimeout: 50 * time.Second,
ConnectionQueueSize: 100,
MinConnectionsPerNode: 50,
}
return aero.NewClientWithPolicy(&policy, "<ip>", 3000)
}
func aeroDelete(client *aero.Client) error {
ns := "test-ssd"
set := "default_set"
stm := aero.NewStatement(ns, set)
queryPolicy := aero.NewQueryPolicy()
queryPolicy.FilterExpression = aero.ExpEq(
aero.ExpStringBin("abc"),
aero.ExpStringVal("pqr9999"),
)
task, err := client.ExecuteUDF(queryPolicy, stm, "delrec", "invalidate")
if err == nil {
for err := range task.OnComplete() {
if err != nil {
return err
}
}
} else {
return err
}
return nil
}
func main() {
client, err := getClient()
if err == nil {
err = aeroDelete(client)
}
if err != nil {
fmt.Println("error: ", err)
}
fmt.Println("done")
}
UDF function definition
function invalidate(rec)
aerospike:remove(rec)
end
Aerospike instance details -
- Aerospike server version - 5.5.0.3
- Aerospike client (golang) version - 4.5.2
- Single Node aerospike
- All bins are non-indexed
- Having single namespace - “test-ssd” which has a single set “default_set” having “300419” objects
Config file
service {
user root
group root
batch-max-buffers-per-queue 512
migrate-max-num-incoming 5
migrate-threads 1
paxos-single-replica-limit 1
proto-fd-idle-ms 70000
proto-fd-max 100000
}
namespace test-ssd {
memory-size 4G
allow-ttl-without-nsup true
default-ttl 30D
high-water-disk-pct 80
high-water-memory-pct 90
nsup-period 120
replication-factor 2
stop-writes-pct 100
storage-engine device {
defrag-lwm-pct 50
device /dev/nvme0n1p1 /dev/sdb
max-write-cache 128M
read-page-cache true
write-block-size 1M
}
background-scan-max-rps 100000
}
Can someone help with this please?