Memory usage/ leak (bug)

Hi,

I got an ASD process which uses lot of memory

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                 
22141 root      20   0 22.3g  15g 2180 S  330 48.2  34870:28 asd    

I have a cluster of 7 nodes, and one namespace which contains 6GB of data, replication factor 2 (14Gb distributed on the cluster). The situation is identical on all nodes.

The only solution I found to recover the memory is to perform a rolling restart of aerospike daemons.

Any hints to know if this behaviour is normal ?

Regards,

Aerospike version used : 3.5.9 Kernel : 3.13.0-48

This isn’t normal, could you provide the output of: You may need to anonymize information in this output (such as IP addresses and namespace names

asadm -e "info"
asadm -e "show distribution"
$ asadm -e "info"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node   Build   Cluster      Cluster     Cluster    Free   Free   Migrates      Principal   Objects      Uptime   
           .       .      Size   Visibility   Integrity   Disk%   Mem%          .              .         .           .   
192.168.0.1    3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.877 M   184:50:29   
192.168.0.21   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.842 M   184:38:19   
192.168.0.24   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.759 M   185:21:13   
192.168.0.28   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.433 M   184:17:23   
192.168.0.41   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.168 M   184:21:24   
192.168.0.42   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.131 M   184:25:17   
192.168.0.48   3.5.9         7   True         True            0     99   (0,0)      192.168.0.41   4.733 M   185:53:20   
Number of rows: 7

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node               Node                Fqdn                  Ip   Client     Current     HB         HB   
           .                 Id                   .                   .    Conns        Time   Self    Foreign   
192.168.0.1    BB9DCC288902500    192.168.0.18:3000   192.168.0.18:3000     7335   176369922      0   35278322   
192.168.0.21   BB95C0FF0902500    192.168.0.21:3000   192.168.0.21:3000     7319   176369922      0   26429307   
192.168.0.24   BB95E0FF0902500    192.168.0.24:3000   192.168.0.24:3000     7328   176369922      0   23651214   
192.168.0.28   BB9828DF2902500    192.168.0.28:3000   192.168.0.28:3000     7321   176369922      0   23671739   
192.168.0.41   *BB9E8F44F902500   192.168.0.41:3000   192.168.0.41:3000     7319   176369922      0   23683092   
192.168.0.42   BB93EE74E902500    192.168.0.42:3000   192.168.0.42:3000     7323   176369922      0   38079896   
192.168.0.48   BB948B34E902500    192.168.0.48:3000   192.168.0.48:3000     7325   176369921      0   35464940   
Number of rows: 7

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node   Namespace   Evictions   Objects     Repl     Stop     HWM          Mem     Mem    HWM      Stop   
           .           .           .         .   Factor   Writes   Disk%         Used   Used%   Mem%   Writes%   
192.168.0.1    redis               0   4.877 M        2    false   50      868.981 MB       1     60        90   
192.168.0.21   redis               0   4.842 M        2    false   50      818.702 MB       1     60        90   
192.168.0.24   redis               0   4.759 M        2    false   50      814.252 MB       1     60        90   
192.168.0.28   redis               0   4.433 M        2    false   50      800.990 MB       1     60        90   
192.168.0.41   redis               0   4.168 M        2    false   50      725.747 MB       1     60        90   
192.168.0.42   redis               0   4.131 M        2    false   50      716.019 MB       1     60        90   
192.168.0.48   redis               0   4.733 M        2    false   50      804.927 MB       1     60        90   
Number of rows: 7


$ asadm -e "show distribution"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~redis - TTL Distribution in Seconds~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Percentage of records having ttl less than or equal to value measured in Seconds     
        Node    10%    20%    30%    40%    50%     60%      70%      80%      90%     100%   
192.168.0.1     426    426    426    852    852    1704     3408     5538     7242    42600   
192.168.0.21    413    413    413    826    826    1652     3304     5369     7021    41300   
192.168.0.24    431    431    431    862    862    1293     3448     5603     6896    43100   
192.168.0.28    411    411    411    822    822    1644     3699     5343     6987    41100   
192.168.0.41    413    413    413    826    826    1239     3304     5369     7021    41300   
192.168.0.42    412    412    412    824    824    1648     3708     5356     7004    41200   
192.168.0.48   6048   6048   6048   6048   6048   24192   562464   586656   598752   604800   
Number of rows: 7

And two memory statistics, from 0.48 and 0.18.

Geen : free Yellow : used Blue : swap.

I have restarted nodes the 28th.

Which features are you using? Secondary Indexes, UDFs, LDTs, lists and maps?

If you are using Secondary Indexes, are you indexing lists and maps (indexing these types hasn’t yet been officially announced), if so there was a memory leak fixed for this type of index in 3.5.14.

We are using UDF and lists. I do not have defined any indexes.

Sorry to ask for the same information again, but I need it collected again at the same time as other commands:

asadm -e "info"
ps up $(pidof asd)
free -m
ipcs -m

All output are there : $ asadm -e "info"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Service In - Pastebin.com

I can provide all needed information, but I need to reboot all nodes before reaching low mem :slight_smile:

We also noticed you have ~7000-11000 client connections per server. This isn’t necessarily a problem as we have users that drive more than this, but if you aren’t expecting this number of client connections then it may help us narrow the search.

How many client instances are you running?

Per client instance, how many threads are they running?

Are your clients running in asynchronous or synchronous mode?

Also while the memory is growing can we try the following:

run:

for i in {1..12}; do asinfo -v "jem-stats:"; sleep 600; done

This should take 2 hours to complete and will be dumping additional memory usage stats to a special console every 10 minutes.

After the above comand completes could you provide the output the following via pasebin?

cat /tmp/aerospike-console.$(pidof asd)

How many client instances are you running?

The aerospike cluster has 7 nodes. There are 30 nodes which are running PHP code, with around 100 lives PHP workers on each node, but probably around 400 PHP workers launched on it. I do not know how the PHP Aerospike drivers handles connections, but I’m using a standard configuration. The cluster handles around 20k HTTP req / s.

Per client instance, how many threads are they running?

I do not know. The PHP Aerospike drivers should use a pool of connections.

Are your clients running in asynchronous or synchronous mode?

Synchronous, but not sure.

Jemalloc stats are available here : DL.FREE.FR

This is looking more like a memory leak, we would like to try to reproduce this locally. We are seeing that the leak seems to be in 64 byte allocations and is growing at a rate of about 20MB per minute, is there an access patter on the client side that may correlate to this allocation?

Could you describe your client app a bit more?

Would you be able to share application/UDF code to help us reproduce this issue locally?

We are seeing that the leak seems to be in 64 byte allocations and is growing at a rate of about 20MB per minute, is there an access patter on the client side that may correlate to this allocation?

The sum of used memory on all aerospike node is pretty stable with time. Data are changing, but not the amount of data.

Could you describe your client app a bit more?

We have three kind of access

  • standard access with get / put
  • increment with increment
  • redis list emulation with UDF : rpush, lpush, rpop, lpop.

LUA implementation of these functions are there : local MAX_INT = 4294967294 - 2local function EXISTS(rec, bin) if aerospike - Pastebin.com

The ‘bin’ value is always ‘r’.

We can have big list in each keys, and push or pop are at high rate : at least of one of each incoming HTTP request : 20 k/s at least.

Hello. Thanks for sending those answers and especially the UDFs ~~ They really help us understand your use case. We have used them in our own test application and have seen some phenomena that may replicate what you are seeing. We’ll keep working on it and keep you up to date with our results.

One question we have is: If the client load is suspended, does the memory use ever go down, or does it simply stay at the same level? Is there any way you can do that sort of test? (If it does go down, then it might indicate some sort of garbage collection latency, rather than an actual memory leak.)

Thanks.

Do you have any updates on this issue, especially related to whether de-applying client load leads to decreased memory use? We have been accumulating evidence that it is a GC latency issue rather than an actual memory leak. If you let us know what you see in your environment, and it fits the pattern we suspect, then we might be able to provide you with means to address GC latency.

So,

  • I have upgraded the cluster to 3.5.15. The memory leak problems seems to have disappeared. Great news !
  • We experienced performance slow down at high traffic. We think it’s related with UDF functions. If you have some hints about gc latency, let me know :smile:
  • It’s very difficult for me to me to stop the traffic on the cluster more than few minutes.

Here is an example of latency. Note that UDF are not the most impacted.

Any hint to reduce this latency ?

Admin> show latency
~~~~~~~~~~~~~~~~~~~~~~~~~~~~proxy Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31       0.0    0.0    0.0     0.0   
192.168.0.21   07:19:18-GMT->07:19:28       0.0    0.0    0.0     0.0   
192.168.0.24   07:19:26-GMT->07:19:36       0.0    0.0    0.0     0.0   
192.168.0.42   07:19:24-GMT->07:19:34       0.0    0.0    0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30       0.0    0.0    0.0     0.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~query Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31       0.0    0.0    0.0     0.0   
192.168.0.21   07:19:18-GMT->07:19:28       0.0    0.0    0.0     0.0   
192.168.0.24   07:19:26-GMT->07:19:36       0.0    0.0    0.0     0.0   
192.168.0.42   07:19:24-GMT->07:19:34       0.0    0.0    0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30       0.0    0.0    0.0     0.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~reads Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31   71193.2   0.37   0.01     0.0   
192.168.0.21   07:19:18-GMT->07:19:28   51730.4   0.08    0.0     0.0   
192.168.0.24   07:19:26-GMT->07:19:36   71845.3   0.14    0.0     0.0   
192.168.0.42   07:19:24-GMT->07:19:34   55501.6   0.15    0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30   71567.3   0.16    0.0     0.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~udf Latency~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31     932.5    0.0    0.0     0.0   
192.168.0.21   07:19:18-GMT->07:19:28     924.2    0.0    0.0     0.0   
192.168.0.24   07:19:26-GMT->07:19:36     973.1   0.01    0.0     0.0   
192.168.0.42   07:19:24-GMT->07:19:34     956.7    0.0    0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30     955.8    0.0    0.0     0.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~writes_master Latency~~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec    >1Ms    >8Ms   >64Ms   
           .                     Span         .       .       .       .   
192.168.0.1    07:19:21-GMT->07:19:31      19.8   100.0   100.0   100.0   
192.168.0.21   07:19:18-GMT->07:19:28      41.2   100.0   100.0   100.0   
192.168.0.24   07:19:26-GMT->07:19:36       5.4   100.0   100.0   100.0   
192.168.0.42   07:19:24-GMT->07:19:34       0.0     0.0     0.0     0.0   
192.168.0.48   07:19:20-GMT->07:19:30      60.9   100.0   100.0   100.0   
Number of rows: 5

~~~~~~~~~~~~~~~~~~~~~~~~~writes_reply Latency~~~~~~~~~~~~~~~~~~~~~~~~
        Node                     Time   Ops/Sec   >1Ms   >8Ms   >64Ms   
           .                     Span         .      .      .       .   
192.168.0.1    07:19:21-GMT->07:19:31      68.9   3.05    2.9    2.32   
192.168.0.21   07:19:18-GMT->07:19:28      74.5   0.94   0.81    0.81   
192.168.0.24   07:19:26-GMT->07:19:36      79.1   1.14   0.63    0.63   
192.168.0.42   07:19:24-GMT->07:19:34      67.9   0.15   0.15    0.15   
192.168.0.48   07:19:20-GMT->07:19:30      71.7   1.12   0.98    0.98

Thanks for the info. Glad it’s working for you now.

Currently, the Lua GC parameters are hard-coded “#define” constants in the code. In the Aerospike Server Community Edition open source code, you can find them using:

$ grep "#define LUA.*GC" aerospike-server/modules/mod-lua/src/main/mod_lua.c

There is a comment block at that location in the source code describing what is known about controlling Lua GC. At this point, it would necessitate a custom build to alter these parameters, e.g., to do GC more frequently to reduce GC latency and avoid out-of-memory situations. If 3.5.15 is working for you now, however, perhaps this is not necessary?

Please let us know if you have more questions or comments or updates about your experiences using Aerospike. Thanks!

Finally, the leak is still there, but memory increase is slower with 3.5.15.

Any hints to fix / workaround that ?

A few questions: 1). What version of the PHP client are you using? 2). Are you using AS_POLICY_COMMIT_LEVEL_MASTER? (If so, this could cause 100% “writes_master” latency to be erroneously reported.) Thanks.