How to debug udf cache issue


#1

How to debug UDF cache related issue

Context

Lua files are cached in the server so that they don’t get reopened and recompiled over and over again. If one observes (via sysdig/strace) that the lua files are being opened frequently (not during registration), it is a potential performance bottleneck.

The following bug was addressed in releases 3.15.0.2 / 3.14.1.5 / 3.13.0.8:

[AER-5755] - (UDF) Allow Lua cache to freely increase to its limit as needed.

Method

We need to verify that “cache-disabled” is not set to false for udf:

http://www.aerospike.com/docs/udf/best_practices.html#overview

Then we need to enable “detail” log level for the udf module:

# start a fresh aerospike.log
mv /var/log/aerospike/aerospike.log aerospike.log.last
kill -HUP `pgrep asd`

asadm -e "asinfo -v 'log-set:id=0;udf=detail'"
asadm -e "asinfo -v 'log-set:id=0;info=debug'"
sleep 10
asadm -e "asinfo -v 'log-set:id=0;info=info'"
asadm -e "asinfo -v 'log-set:id=0;udf=info'"

# save and compress the aerospike.log
mv /var/log/aerospike/aerospike.log aerospike.log.debug
kill -HUP `pgrep asd`
gzip aerospike.log.debug

The “kill -HUP” command forces a re-open of log file (/var/log/aerospike/aerospike.log). It is optional but allows to have the extra volume of detailed log on a separate file.

We also enable “debug” for the info module to ensure we don’t read the lua file via asinfo protocol.

Example

When enabled “DETAIL”, debug logs will be included:

zgrep DETAIL aerospike.log.debug.gz  | wc
     65     808    7442

zgrep DEBUG aerospike.log.debug.gz  | wc
    205    3866   33412

zgrep DEBUG aerospike.log.debug.gz  | grep -v -i info | wc
    187    3645   32008

zgrep CACHE aerospike.log.debug.gz  | wc
      4      50     388

zgrep CACHE aerospike.log.debug.gz
Nov 13 2017 21:33:17 GMT: DETAIL (udf): (src/main/mod_lua.c:627) [CACHE] took state: sum_example
Nov 13 2017 21:33:17 GMT: DETAIL (udf): (src/main/mod_lua.c:637) [CACHE] Miss 0 : Total 14
Nov 13 2017 21:33:17 GMT: DETAIL (udf): (src/main/mod_lua.c:736) [CACHE] found entry: sum_example
Nov 13 2017 21:33:17 GMT: DETAIL (udf): (src/main/mod_lua.c:740) [CACHE] returning state: sum_example

Prior to AER-5755, the detail log will be something like this where sum_example.lua is the lua file:

Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:635) [CACHE] took state: sum_example (10)
Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:648) [CACHE] Miss 78595850 : Total 5236241417
Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:648) [CACHE] Miss 0 : Total 5236241418
Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:648) [CACHE] Miss 78595851 : Total 5236241419
Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:648) [CACHE] Miss 78595852 : Total 5236241420
Oct 31 2017 20:27:58 GMT: DETAIL (udf): (src/main/mod_lua.c:747) [CACHE] found entry: sum_example (10)

In versions that do not have the AER-5755 issue addressed, the cache initially has 10 entries per file at startup. It then grows as needed up to a maximum of 128, but only does so if the cache miss rate exceeds 2 percent. The above cache miss is 78595852/5236241420 = 1.5% preventing the cache entries to grow beyond 10.

Such limitation has been removed to allow further caching to take place.

Keywords

UDF CACHE DETAIL DEBUG LOG

Timestamp

15 Nov 2017