Hi I’m facing a memory leak with the latest aerospike 3.11.0.1 The last log lines about memory are:
Jan 12 2017 07:18:45 GMT: INFO (info): (ticker.c:428) {primary} memory-usage: total-bytes 2775205050 index-bytes 1113077632 sindex-bytes 1804204 data-bytes 1660323214 used-pct 11.24
Jan 12 2017 07:18:45 GMT: INFO (info): (ticker.c:458) {primary} device-usage: used-bytes 4769849856 avail-pct 94
Jan 12 2017 07:18:45 GMT: INFO (info): (ticker.c:551) {primary} client: tsvc (0,7139) proxy (3345,0,0) read (1232971977,0,0,313365859) write (2808217,15,41) delete (736,0,0,50223) udf (0,0,0) lang (0,0,0,0)
Jan 12 2017 07:18:45 GMT: INFO (info): (ticker.c:632) {primary} query: basic (13292712,1319) aggr (0,0) udf-bg (0,0)
and then in dmesg:
[Thu Jan 12 07:18:46 2017 <73245.073014>] asd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Thu Jan 12 07:18:46 2017 < 0.000005>] asd cpuset=/ mems_allowed=0
[Thu Jan 12 07:18:46 2017 < 0.000004>] CPU: 3 PID: 10187 Comm: asd Tainted: G OX 3.13.0-107-generic #154-Ubuntu
[Thu Jan 12 07:18:46 2017 < 0.000001>] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[Thu Jan 12 07:18:46 2017 < 0.000002>] 0000000000000000 ffff8807857db970 ffffffff8172d229 ffff880784463000
[Thu Jan 12 07:18:46 2017 < 0.000004>] 0000000000000000 ffff8807857db9f8 ffffffff817277c8 0000000000000000
[Thu Jan 12 07:18:46 2017 < 0.000003>] 0000000000000000 0000000000000012 0000000000000001 ffffffff81c3fa60
[Thu Jan 12 07:18:46 2017 < 0.000004>] Call Trace:
[Thu Jan 12 07:18:46 2017 < 0.000008>] [<ffffffff8172d229>] dump_stack+0x64/0x82
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff817277c8>] dump_header+0x7f/0x1f1
[Thu Jan 12 07:18:46 2017 < 0.000005>] [<ffffffff81156d81>] oom_kill_process+0x201/0x360
[Thu Jan 12 07:18:46 2017 < 0.000005>] [<ffffffff812dde55>] ? security_capable_noaudit+0x15/0x20
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff81157511>] out_of_memory+0x471/0x4b0
[Thu Jan 12 07:18:46 2017 < 0.000004>] [<ffffffff8115d82c>] __alloc_pages_nodemask+0xa6c/0xb90
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff8119bf93>] alloc_pages_current+0xa3/0x160
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff81153837>] __page_cache_alloc+0x97/0xc0
[Thu Jan 12 07:18:46 2017 < 0.000002>] [<ffffffff811552a5>] filemap_fault+0x185/0x410
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff8117a4bf>] __do_fault+0x6f/0x530
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff8117e440>] handle_mm_fault+0x5b0/0xf10
[Thu Jan 12 07:18:46 2017 < 0.000002>] [<ffffffff811531f0>] ? sleep_on_page+0x20/0x20
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff817393c4>] __do_page_fault+0x184/0x560
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff81616bc7>] ? SYSC_recvfrom+0x127/0x160
[Thu Jan 12 07:18:46 2017 < 0.000004>] [<ffffffff8120be90>] ? SyS_epoll_ctl+0xf0/0x9f0
[Thu Jan 12 07:18:46 2017 < 0.000002>] [<ffffffff817397ba>] do_page_fault+0x1a/0x70
[Thu Jan 12 07:18:46 2017 < 0.000003>] [<ffffffff81735ae8>] page_fault+0x28/0x30
[Thu Jan 12 07:18:46 2017 < 0.000001>] Mem-Info:
[Thu Jan 12 07:18:46 2017 < 0.000002>] Node 0 DMA per-cpu:
[Thu Jan 12 07:18:46 2017 < 0.000002>] CPU 0: hi: 0, btch: 1 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 1: hi: 0, btch: 1 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 2: hi: 0, btch: 1 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 3: hi: 0, btch: 1 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000000>] Node 0 DMA32 per-cpu:
[Thu Jan 12 07:18:46 2017 < 0.000002>] CPU 0: hi: 186, btch: 31 usd: 30
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 1: hi: 186, btch: 31 usd: 97
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 2: hi: 186, btch: 31 usd: 19
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 3: hi: 186, btch: 31 usd: 8
[Thu Jan 12 07:18:46 2017 < 0.000001>] Node 0 Normal per-cpu:
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 0: hi: 186, btch: 31 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 1: hi: 186, btch: 31 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 2: hi: 186, btch: 31 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000001>] CPU 3: hi: 186, btch: 31 usd: 0
[Thu Jan 12 07:18:46 2017 < 0.000003>] active_anon:7670523 inactive_anon:74 isolated_anon:0
[Thu Jan 12 07:18:46 2017 < 0.000000>] active_file:113 inactive_file:0 isolated_file:0
[Thu Jan 12 07:18:46 2017 < 0.000000>] unevictable:1165 dirty:5 writeback:0 unstable:0
[Thu Jan 12 07:18:46 2017 < 0.000000>] free:40535 slab_reclaimable:8114 slab_unreclaimable:17388
[Thu Jan 12 07:18:46 2017 < 0.000000>] mapped:870 shmem:2255 pagetables:16505 bounce:0
[Thu Jan 12 07:18:46 2017 < 0.000000>] free_cma:0
[Thu Jan 12 07:18:46 2017 < 0.000002>] Node 0 DMA free:15904kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Jan 12 07:18:46 2017 < 0.000003>] lowmem_reserve[]: 0 3744 30661 30661
[Thu Jan 12 07:18:46 2017 < 0.000003>] Node 0 DMA32 free:115872kB min:8248kB low:10308kB high:12372kB active_anon:3681588kB inactive_anon:8kB active_file:160kB inactive_file:0kB unevictable:276kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3836704kB mlocked:276kB dirty:0kB writeback:0kB mapped:204kB shmem:1148kB slab_reclaimable:3300kB slab_unreclaimable:7128kB kernel_stack:272kB pagetables:7048kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1266 all_unreclaimable? yes
[Thu Jan 12 07:18:46 2017 < 0.000003>] lowmem_reserve[]: 0 0 26917 26917
[Thu Jan 12 07:18:46 2017 < 0.000002>] Node 0 Normal free:30364kB min:59300kB low:74124kB high:88948kB active_anon:27000504kB inactive_anon:288kB active_file:292kB inactive_file:0kB unevictable:4384kB isolated(anon):0kB isolated(file):0kB present:28067840kB managed:27563592kB mlocked:4384kB dirty:20kB writeback:0kB mapped:3276kB shmem:7872kB slab_reclaimable:29156kB slab_unreclaimable:62424kB kernel_stack:1824kB pagetables:58972kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:528 all_unreclaimable? yes
[Thu Jan 12 07:18:46 2017 < 0.000004>] lowmem_reserve[]: 0 0 0 0
[Thu Jan 12 07:18:46 2017 < 0.000002>] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15904kB
[Thu Jan 12 07:18:46 2017 < 0.000008>] Node 0 DMA32: 1588*4kB (UEM) 911*8kB (UEM) 405*16kB (UEM) 232*32kB (UEM) 153*64kB (UEM) 96*128kB (UEM) 67*256kB (UEM) 30*512kB (UEM) 29*1024kB (UM) 0*2048kB 1*4096kB (M) = 115928kB
[Thu Jan 12 07:18:46 2017 < 0.000009>] Node 0 Normal: 761*4kB (UEM) 553*8kB (UEM) 309*16kB (UEM) 139*32kB (UEM) 57*64kB (EM) 23*128kB (EM) 28*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 30620kB
[Thu Jan 12 07:18:46 2017 < 0.000009>] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Jan 12 07:18:46 2017 < 0.000001>] 3317 total pagecache pages
[Thu Jan 12 07:18:46 2017 < 0.000001>] 0 pages in swap cache
[Thu Jan 12 07:18:46 2017 < 0.000001>] Swap cache stats: add 0, delete 0, find 0/0
[Thu Jan 12 07:18:46 2017 < 0.000001>] Free swap = 0kB
[Thu Jan 12 07:18:46 2017 < 0.000001>] Total swap = 0kB
[Thu Jan 12 07:18:46 2017 < 0.000001>] 7999901 pages RAM
[Thu Jan 12 07:18:46 2017 < 0.000000>] 0 pages HighMem/MovableOnly
[Thu Jan 12 07:18:46 2017 < 0.000001>] 126062 pages reserved
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Thu Jan 12 07:18:46 2017 < 0.000004>] [ 502] 0 502 5000 214 15 0 0 upstart-udev-br
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 505] 0 505 12392 288 27 0 -1000 systemd-udevd
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 644] 0 644 3814 100 12 0 0 upstart-socket-
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 676] 0 676 2554 645 8 0 0 dhclient
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 820] 102 820 9802 203 24 0 0 dbus-daemon
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 842] 0 842 10861 271 26 0 0 systemd-logind
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 845] 101 845 65114 4967 36 0 0 rsyslogd
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 868] 0 868 3818 112 12 0 0 upstart-file-br
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 947] 0 947 3633 204 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 950] 0 950 3633 206 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 954] 0 954 3633 205 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 956] 0 956 3633 203 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 958] 0 958 3633 203 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1002] 0 1002 4783 41 14 0 0 atd
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 1003] 0 1003 5912 219 17 0 0 cron
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1013] 0 1013 15344 355 34 0 -1000 sshd
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1038] 0 1038 4821 164 14 0 0 irqbalance
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 1049] 0 1049 1091 155 8 0 0 acpid
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1137] 0 1137 3633 205 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 1138] 0 1138 3195 189 12 0 0 getty
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1276] 1000 1276 7166 1366 17 0 0 tmux
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1277] 1000 1277 5291 685 15 0 0 bash
[Thu Jan 12 07:18:46 2017 < 0.000001>] [ 1296] 0 1296 16997 311 36 0 0 sudo
[Thu Jan 12 07:18:46 2017 < 0.000002>] [ 1297] 0 1297 5310 739 14 0 0 bash
[Thu Jan 12 07:18:46 2017 < 0.000002>] [10024] 0 10024 8848258 7649467 15938 0 0 asd
[Thu Jan 12 07:18:46 2017 < 0.000001>] [15399] 0 15399 15353 2612 34 0 0 gunicorn: maste
[Thu Jan 12 07:18:46 2017 < 0.000002>] [15405] 0 15405 19634 5416 42 0 0 gunicorn: worke
[Thu Jan 12 07:18:46 2017 < 0.000001>] [15408] 0 15408 1110 33 6 0 0 amc
[Thu Jan 12 07:18:46 2017 < 0.000002>] [15409] 0 15409 1110 57 6 0 0 amc
[Thu Jan 12 07:18:46 2017 < 0.000002>] [15415] 0 15415 1110 147 7 0 0 sh
[Thu Jan 12 07:18:46 2017 < 0.000001>] [19805] 0 19805 4235 1167 13 0 0 atop
[Thu Jan 12 07:18:46 2017 < 0.000002>] [17378] 999 17378 13695 1932 31 0 0 python
[Thu Jan 12 07:18:46 2017 < 0.000001>] [28982] 0 28982 1478 138 7 0 0 sleep
[Thu Jan 12 07:18:46 2017 < 0.000002>] Out of memory: Kill process 10024 (asd) score 946 or sacrifice child
[Thu Jan 12 07:18:46 2017 < 0.004401>] Killed process 10024 (asd) total-vm:35393032kB, anon-rss:30597868kB, file-rss:0kB
the config is the following:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-protocol v4
pidfile /var/run/aerospike/asd.pid
service-threads 2
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 90000
paxos-max-cluster-size 16
node-id-interface eth0
}
cluster {
mode dynamic
self-group-id 1
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 172.31.34.199
port 3000
access-address 172.31.34.199
}
heartbeat {
mode mesh
port 3002
# mesh-seed-address-port 172.31.37.249 3002
interval 150
timeout 40
}
fabric {
address 172.31.34.199
port 3001
}
info {
address 172.31.34.199
port 3003
}
}
namespace primary {
replication-factor 2
memory-size 23G
default-ttl 90d
stop-writes-pct 90
high-water-memory-pct 85
high-water-disk-pct 80
write-commit-level-override master
storage-engine device {
device /dev/xvdf
defrag-lwm-pct 50
scheduler-mode noop
write-block-size 1M
data-in-memory true
}
}
it’s an r3.xlarge instance with 30G of RAM. How do I properly catch the memory leak in order to help you to fix it? It does’t happen without “data-in-memory true” option