I am having server crashes with “out of memory” when i am running backend jobs with migration procedures.
Jobs do things like:
- filter set1 (1-5 million). For each record adds item in 3 llist in set2
- counts size of this llist
- removes llist
After two or three jobs finished several nodes crash with out of memory error. Every job takes almost 2gb of system ram (never released).
Cluster system memory now (very differrent values between nodes)
n1 n2 n3 n4
system_free_mem_pct 70 81 41 63
Cluster has four node in google cloud with following structure:
- n1-highmem-4: 4cpu 26gb ram + 100 gb ssd (not local)
- namespace:
- 20gb ram
- entire ssd disk
- only indexes in ram
- ttl=0, write-block-size=128k, ldt enabled, disable-odirect=true
asmonitor info namespaces
ip/namespace Avail Evicted Master Repl Stop Used Used Used Used hwm hwm
Pct Objects Objects Factor Writes Disk Disk Mem Mem Disk Mem
. . . . . . % . % . .
aerospike-vm-4.internal/ns 82 0 10,212,416 2 false 14.00 G 14 2.48 G 13 50 60
aerospike-vm-3.internal/ns 83 0 9,683,896 2 false 14.85 G 15 2.33 G 12 50 60
aerospike-vm-2.internal/ns 84 0 9,453,524 2 false 14.67 G 15 2.06 G 11 50 60
aerospike-vm-1.internal/ns 85 0 10,113,823 2 false 13.53 G 14 2.11 G 11 50 60
Am i doing something wrong?
How can i prevent this crashes?
Should I tune configuration for this kind of jobs?
Regards