How To identify read hotkeys


#1

How To identify read hotkeys

Context

Read hotkeys could be difficult to determine as there are no stats available to track concurrent read request for the same key when running in AP mode (when running in strong-consistency mode, read transactions will be serialized through the rw-hash along with write transactions).

As of version 3.16, the rw-client log context has a detail level log line to show all client transactions that get as far as successfully reserving a partition (includes namespace, client-IP, digest, and whether read/write/delete/udf):

# Turn detail level logging for rw-client context
asinfo -v "set-log:id=0;rw-client=detail"
# Turn back to info
asinfo -v "set-log:id=0;rw-client=info"

Example output:

{ns_name} client 10.0.3.182:51160 write <Digest>:0x8df238affec6f8e3a2c22d6c54c91c5bc4f3ff81

WARNING - Turning this detail log could cause an overwhelming amount of log lines printed. It should be enough to turn this on for a few seconds in order to detect hotkeys.

The rest of this article focuses on the use of TCPdump to look for the Digest Id of a transaction along with linux tools like sort and uniq to get the highest used digest keys.

Capturing the 20-bytes transaction DIGEST ID with tcpdump in real-time:

Based on the pattern 00001504 and converting the tcpdump into HEX we can use the following script to capture the transaction digest in realtime , output to stdout and halt it with a CTRL-C:


sudo tcpdump -n -K -vvv -XX -s0 port 3000 and not port 22 -i any|egrep -v 'IP|length'|cut -d ' ' -f 3-10|xargs|sed -e 's/ //g'|grep -oE '\00001504[[:alnum:]]{40}'

Example:


vagrant@vagrant-ubuntu-trusty-64:~$ sudo tcpdump -n -K -vvv -XX -s0 port 3000 and not port 22 -i any|egrep -v 'IP|length'|cut -d ' ' -f 3-10|xargs|sed -e 's/ //g'|grep -oE '\00001504[[:alnum:]]{40}'
 
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
 
00001504f59124986e96ad175b374c9487945bbcad537b74
 
00001504f59124986e96ad175b374c9487945bbcad537b74

00001504f59124986e96ad175b374c9487945bbcad537b74
 
00001504f59124986e96ad175b374c9487945bbcad537b74
 
00001504f59124986e96ad175b374c9487945bbcad537b74
 
00001504f59124986e96ad175b374c9487945bbcad537b74

Output can also be saved to a file using command line timeout (60 seconds) and tee to a file (hotkeys.txt):

timeout 60 sudo tcpdump -n -K -vvv -XX -s0 port 3000 and not port 22 -i any|egrep -v 'IP|length'|cut -d ' ' -f 3-10|xargs|sed -e 's/ //g'|grep -oE '\00001504[[:alnum:]]{40}'|tee hotkeys.txt

The final step would be to sort the output and use uniq command line tool to get the digest of the keys that have been hit the most:

cat hotkeys.txt|sort|uniq -c|sort -n

Example:

    114 00001504463a09ea5b836a85aa6d3549b6bdbecc44e9fcf2
    114 0000150484c403a3a6a5e4796147200e539818d8e5f31ce5
    114 00001504daf9ff56e8e5df215fbb18c3977c94086fe61eea
    114 00001504ec72093b6a4f8d80ed63b1cb57bfc0ff1c4d8adc
    116 00001504122de28b8362f0c4e88b40b0a0b1307889020a69
    117 00001504a7263d75adcbf319f7d0a98623704cbf7bbd06a9
    123 00001504a7dadfe745cf6163369e84acfcb3a13207950963
    1008 0000150413c7017bbc4fbe2dd25a2ef7aa996e56cfcf9913

Capturing the 20-bytes transaction DIGEST ID from a .pcap file:

The following script can read a tcpdump pcap file in a directory and create an output file containing only the DIGEST ID captured:

Save all pcaps for affected server in a sub-directory and cd to that directory to run the folowing script:

for i in `ls|grep -v dlog`; do tcpdump -XX -s0 -r ${i}|grep -v '>'|cut -d ' ' -f 3-10|tr "\n" " "|sed -e 's/ //g'|grep -oE '\00001504[[:alnum:]]{40}' >> digest.dlog; done;

Use Wireshark and filter by the most used 20 bytes Digest key:

Use the following filter: data.data contains <XX:YY:ZZ:…:… >

where <XX:YY:ZZ:…:… > is a 20 bytes DIGEST

Check the timestamps and confirm that these requests occured in the same time range.

Example:

for i in `ls|grep -v dlog`; do tcpdump -XX -s0 -r ${i}|grep -v '>'|cut -d ' ' -f 3-10|tr "\n" " "|sed -e 's/ //g'|grep -oE '\00001504[[:alnum:]]{40}' >> digest_case_XXXX.dlog; done;
reading from file test.pcap, link-type LINUX_SLL (Linux cooked)
 
   1132 00001504f59124986e96ad175b374c9487945bbcad537b74
   

DIGEST should be the 20 bytes after 00001504.

Use wireshark and filter by the most used key to confirm rate of transactions for :

data.data contains f5:91:24:98:6e:96:ad:17:5b:37:4c:94:87:94:5b:bc:ad:53:7b:74

Notes

Above scripts can be modified to use specific ports and interface. It is recommended to check the digest in Wireshark and confirm the frequency of the transactions qualifies as a hotkey.

Get specific record details:

The DIGEST should be 40 hex digits after "00001504" pattern generated from tcpdump.

0000150484ea2a0c75889622f6b342c6ff6f79c1c613b182 => 84ea2a0c75889622f6b342c6ff6f79c1c613b182

Use the below defined UDF to get details like Primary Key (if key is stored) and Set Name for that given digest.

function get_details(rec)
	x = "SetName: "

	if record.setname(rec) then
		x = x .. record.setname(rec)
		if record.key(rec) then
			x = x .. ", Key: "
			x = x .. record.key(rec)
		end
	else
		return "[ERROR]-RECORD_NOT_FOUND"
	end

	return x
end
aql> execute filename.get_details() on test where DIGEST="5c4ac062dd2e7848650f4ed505dae88ba9b19856"
+----------------------------+
| get_details                |
+----------------------------+
| "SetName: testset, Key: 2" |
+----------------------------+
1 row in set (0.002 secs)

OK

aql> execute filename.get_details() on test where DIGEST="F59124986E96AD175B374C9487945BBCAD537B74"
+----------------------------+
| get_details                |
+----------------------------+
| "[ERROR]-RECORD_NOT_FOUND" |
+----------------------------+
1 row in set (0.001 secs)

OK

Keywords

Read hotkey tcpdump latency

Timestamp

09/14/2018