Out of memory crashes


#1

I am having server crashes with “out of memory” when i am running backend jobs with migration procedures.

Jobs do things like:

  • filter set1 (1-5 million). For each record adds item in 3 llist in set2
  • counts size of this llist
  • removes llist

After two or three jobs finished several nodes crash with out of memory error. Every job takes almost 2gb of system ram (never released).

Cluster system memory now (very differrent values between nodes)

                     n1  n2  n3  n4
system_free_mem_pct  70  81  41  63

Cluster has four node in google cloud with following structure:

  • n1-highmem-4: 4cpu 26gb ram + 100 gb ssd (not local)
  • namespace:
  • 20gb ram
  • entire ssd disk
  • only indexes in ram
  • ttl=0, write-block-size=128k, ldt enabled, disable-odirect=true

asmonitor info namespaces

ip/namespace                   Avail   Evicted       Master     Repl     Stop      Used   Used      Used   Used    hwm   hwm
                                 Pct   Objects      Objects   Factor   Writes      Disk   Disk       Mem    Mem   Disk   Mem
                                   .         .            .        .        .         .      %         .      %      .     .
aerospike-vm-4.internal/ns        82         0   10,212,416        2    false   14.00 G     14    2.48 G     13     50    60
aerospike-vm-3.internal/ns        83         0    9,683,896        2    false   14.85 G     15    2.33 G     12     50    60
aerospike-vm-2.internal/ns        84         0    9,453,524        2    false   14.67 G     15    2.06 G     11     50    60
aerospike-vm-1.internal/ns        85         0   10,113,823        2    false   13.53 G     14    2.11 G     11     50    60

Am i doing something wrong?

How can i prevent this crashes?

Should I tune configuration for this kind of jobs?

Regards