High memory usage of Aerospike Server while performing UDF operations via aerospike golang client

Observing high memory usage of Aerospike Server while performing UDF operations via aerospike golang client.

After every execution of the udf function from client, memory keeps on increasing and ultimately the aerospike crashes after memory has exhausted.

Methods already tried

  1. set “cache-enabled” property of “mod-lua” in config to “false” - didn’t help.
  2. tried using predicate filtering (deprecated in latest aerospike version) instead of filter expression - didn’t help.
  3. Tried with an empty udf function as well to eliminate the possibility of memory leak via udf function - didn’t help.
  4. If I just query data (using client method “Query”) instead of executing udf function to delete them, memory isn’t impacted.

Steps to replicate -

  1. Use filter expression and filter for a bin value.
  2. Execute function “ExecuteUDF” of aerospike go client with the above filter while passing udf function to execute - “invalidate” (code mentioned below).

Aerospike Memory stats
Note: all the memory snapshots are taken while there is no external traffic to the aerospike instance except for the script given below which is used for testing this.

Command used - asadm -e “show stat” | grep mem

  1. Initial details before executing any query from client
memory_data_bytes|0
cluster_is_member                    |true
system_free_mem_pct                  |98
high-water-memory-pct                    |90
memory-size                              |4294967296
memory_free_pct                          |99
memory_used_bytes                        |19226816
memory_used_data_bytes                   |0
memory_used_index_bytes                  |19226816
memory_used_sindex_bytes                 |0
storage-engine.data-in-memory            |false
xmem_id                                  |0
  1. After running below code once (observation - system_free_mem_pct has reduced)
memory_data_bytes|0
cluster_is_member                    |true
system_free_mem_pct                  |79
high-water-memory-pct                    |90
memory-size                              |4294967296
memory_free_pct                          |99
memory_used_bytes                        |19226752
memory_used_data_bytes                   |0
memory_used_index_bytes                  |19226752
memory_used_sindex_bytes                 |0
storage-engine.data-in-memory            |false
xmem_id                                  |0
  1. After running below code once again (observation - system_free_mem_pct has reduced further)
memory_data_bytes|0
cluster_is_member                    |true
system_free_mem_pct                  |60
high-water-memory-pct                    |90
memory-size                              |4294967296
memory_free_pct                          |99
memory_used_bytes                        |19226752
memory_used_data_bytes                   |0
memory_used_index_bytes                  |19226752
memory_used_sindex_bytes                 |0
storage-engine.data-in-memory            |false
xmem_id                                  |0
  1. After restarting aerospike (its back to the initial state mentioned in #1)
memory_data_bytes|0
cluster_is_member                    |true
system_free_mem_pct                  |98
high-water-memory-pct                    |90
memory-size                              |4294967296
memory_free_pct                          |99
memory_used_bytes                        |19226816
memory_used_data_bytes                   |0
memory_used_index_bytes                  |19226816
memory_used_sindex_bytes                 |0
storage-engine.data-in-memory            |false
xmem_id                                  |0

Client code
Objective of this code - delete record where “abc” = “pqr9999” (“abc” is bin name here)

package main

import (
	"fmt"
	aero "github.com/aerospike/aerospike-client-go"
	"time"
)

func getClient() (*aero.Client, error) {
	policy := aero.ClientPolicy{
		Timeout:               20 * time.Second,
		IdleTimeout:           50 * time.Second,
		ConnectionQueueSize:   100,
		MinConnectionsPerNode: 50,
	}

	return aero.NewClientWithPolicy(&policy, "<ip>", 3000)
}

func aeroDelete(client *aero.Client) error {
	ns := "test-ssd"
	set := "default_set"

	stm := aero.NewStatement(ns, set)
	queryPolicy := aero.NewQueryPolicy()

	queryPolicy.FilterExpression = aero.ExpEq(
		aero.ExpStringBin("abc"),
		aero.ExpStringVal("pqr9999"),
	)

	task, err := client.ExecuteUDF(queryPolicy, stm, "delrec", "invalidate")
	if err == nil {
		for err := range task.OnComplete() {
			if err != nil {
				return err
			}
		}
	} else {
		return err
	}
	return nil
}

func main() {
	client, err := getClient()
	if err == nil {
		err = aeroDelete(client)
	}

	if err != nil {
		fmt.Println("error: ", err)
	}

	fmt.Println("done")
}

UDF function definition

function invalidate(rec)
        aerospike:remove(rec)
end

Aerospike instance details -

  • Aerospike server version - 5.5.0.3
  • Aerospike client (golang) version - 4.5.2
  • Single Node aerospike
  • All bins are non-indexed
  • Having single namespace - “test-ssd” which has a single set “default_set” having “300419” objects

Config file

service {
  user root
  group root
  batch-max-buffers-per-queue 512
  migrate-max-num-incoming 5
  migrate-threads 1
  paxos-single-replica-limit 1
  proto-fd-idle-ms 70000
  proto-fd-max 100000
}

namespace test-ssd {
  memory-size 4G
  allow-ttl-without-nsup true
  default-ttl 30D
  high-water-disk-pct 80
  high-water-memory-pct 90
  nsup-period 120
  replication-factor 2
  stop-writes-pct 100
  storage-engine device {
    defrag-lwm-pct 50
    device /dev/nvme0n1p1 /dev/sdb
    max-write-cache 128M
    read-page-cache true
    write-block-size 1M
  }
  background-scan-max-rps 100000
}

Can someone help with this please?

Seems this is a known issue that has been addressed in April in version 5.5.0.9:

  • [AER-6418] - (UDF) For namespaces with data not in memory, executing a UDF with an expression filter that evaluates false will leak memory.
2 Likes

Thanks a lot @meher . Upgrading aerospike server to v5.7.0.8 (current latest) helped.

Thanks for closing the loop!

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.