How To use rw-client logging to identify a hot digest?

How To use rw-client logging to identify a hot digest?

Context

A hotkey is defined as a key that is being accessed in a disproportionate manner in comparison to the norm for the dataset. Hotkeys can be problematic as they can create a point load on the cluster which normally uses a balanced workload to extract maximum efficiency. The hotkey itself must be solved at the application level, however, it is possible to use the rw-client logging context to identify which digests are being accessed. How is the output from this logging context actually used.

The output for server 5.7 follows:

Oct 20 2021 17:29:25 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:51488 write

The output for server 4.2 follows. Digest is uppercase and without transaction type at the end, and the client IP address is before the digest:

Oct 20 2021 18:32:33 GMT: DETAIL (rw-client): (thr_tsvc.c:122) {myNamespace} client 127.0.0.1:59930 write <Digest>:0xd428aeb3bc0f350a56fbf77d882f175063b9fd5f

Method

Setting the rw-client logging to detail logs every single digest accessed by clients into the log file (typically aerospike.log or in journald) on each node. This creates a lot of data and shouldn’t be kept on for longer than necessary. Run the following command against the aerospike.log file (or whichever log file is used) in order to sort the different digest accessed:

cat aerospike.log|grep rw-client|grep -i digest|sed -e 's/^.*igest//g'|tr ' ' ':'|awk 'BEGIN { FS = ":" } ;{print $2,$4}'|sort|uniq -c|sort -nr

The output would look as follows:

cat aerospike.log|grep rw-client|grep -i digest|sed -e 's/^.*igest//g'|tr ' ' ':'|awk 'BEGIN { FS = ":" } ;{print $2,$4}'|sort|uniq -c|sort -nr | more 
 193 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.180.12
 186 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.180.229
 115 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.181.35
  77 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.178.219
  69 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.177.72
  64 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.177.162
  62 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.176.118
  61 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.176.104
  60 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.177.15
  58 77d22f0f7620c0cf97aacef92460784f711f5d95 11.118.176.142
  14 90127ff46818c064e9fb1e0a52327c3070a2ee61 11.118.180.229
  ...
  ...
   1 f60e2776113d79ddd86091c2379350a7d8223b1a 11.118.177.162
   1 f60e2776113d79ddd86091c2379350a7d8223b1a 11.118.176.118
   1 f60e2776113d79ddd86091c2379350a7d8223b1a 11.118.176.104

From the results above it is clear that 77d22f0f7620c0cf97aacef92460784f711f5d95 is the hottest digest.

If you do not need the client IP address, then this should work for older and newer versions of Aerospike:

 awk '/rw-client/ { if (length($13) > length($11)) print $13 ;else print $11}' aerospike.log | sort | uniq -c | sort -n | tail -5

For server version 4.9 or later, the output is as follows:

      1 909d46d66622d497215d12650357b381939c9211
      1 d513cbf42ac7b2c18f2ce307266fe7acca74e86c
      1 ff1eddbba6db16669beb8a6b0137b530d1c79294
      2 4f45c87d8a53b69049d8f7ab355385843a968d2e
      5 d428aeb3bc0f350a56fbf77d882f175063b9fd5f

For older versions, the result is like this which includes the Digest keyword:

      2 <Digest>:0x50eaca01ed2d01c2da9cb9284cb00fd022140989
      2 <Digest>:0x540e44fcd1470434bbebdb5ff0b36a6046901c9d
      2 <Digest>:0xff1eddbba6db16669beb8a6b0137b530d1c79294
      3 <Digest>:0x901376867278cedc0d7607f04bcdbf156fc13082
      7 <Digest>:0xd428aeb3bc0f350a56fbf77d882f175063b9fd5f

To confirm that it is really the hotkey, we should grep for the digest again, for example:

grep d428aeb3bc0f350a56fbf77d882f175063b9fd5f aerospike.log | head
Oct 20 2021 17:29:25 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:51488 write
Oct 20 2021 19:44:06 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:37554 write
Oct 20 2021 19:44:08 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:37644 write
Oct 20 2021 19:44:09 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:37738 write
Oct 20 2021 19:44:10 GMT: DETAIL (rw-client): (thr_tsvc.c:108) {bar} digest d428aeb3bc0f350a56fbf77d882f175063b9fd5f client 127.0.0.1:37798 write

In this case, the digest occurred at more than one second apart which would typically be acceptable.

We should also count the hot digest based on different types of transactions:

HOTDIGEST=d428aeb3bc0f350a56fbf77d882f175063b9fd5f
grep $HOTDIGEST aerospike.log | wc
grep $HOTDIGEST aerospike.log | grep read | wc
grep $HOTDIGEST aerospike.log | grep write | wc
grep $HOTDIGEST aerospike.log | grep delete| wc

If you observed only read transactions, then enable the read-page-cache configuration to alleviate the pressure on the storage subsystem (if not configured for data in memory).

If there is a mix of writes, deletes, and read, then it may not help because write transactions would invalidate the read page cache.

If you also want to show the client IP address, you can modify the above awk script as such:

awk '/rw-client/ { if (length($13) > length($11)) print $13,$11 ;else print $11,$13}'

For example:

awk '/rw-client/ { if (length($13) > length($11)) print $13,$11 ;else print $11,$13}' aerospike.log | sort | uniq -c | sort -n | tail -5
      1 <Digest>:0xd513cbf42ac7b2c18f2ce307266fe7acca74e86c 127.0.0.1:55220
      1 <Digest>:0xff1eddbba6db16669beb8a6b0137b530d1c79294 127.0.0.1:34020
      1 <Digest>:0xff1eddbba6db16669beb8a6b0137b530d1c79294 127.0.0.1:54678
      1 id=0;rw-client=detail executed:
      5 <Digest>:0xd428aeb3bc0f350a56fbf77d882f175063b9fd5f 127.0.0.1:59754

The above includes the line that enables rw-client, but it is counted only once so it should not affect finding the hot digest.

Notes

  • The rw-client logging context is detailed in the Aerospike Log Reference
  • Details on error 14 (hotkeys) messages can be found here.
  • Details on looking for read hotkeys can be found here.
  • The rw-client context can generate a lot of output when set to detail. It should be switched back to info as soon as investigations are completed.
    • For example:
      asinfo -v "set-log:id=0;rw-client=detail"
      sleep 10
      asinfo -v "set-log:id=0;rw-client=info"
      

Applies To

Server 4.0 or later.

Keywords

HOTKEY RW-CLIENT DIGEST LOGGING

Timestamp

October 2021