How to perform Hot Key Analysis?


#1

Hi,

How can one go about analyzing which are the hottest N keys that are being queried and/or updated? I’m aware of Hot Key error code 14 but that might be a bit too late (and also only applies to updates) - how can one do it proactively?

Btw, I am aware that keys should be random and evenly distributed but it always happens that there are some weird ones out there that we end up having to filter - e.g. empty strings, “null”, “N/A”, etc - things that are most likely invalid but unless we find them we can’t filter them out.

Thanks!


#2

This may not be so elegant, but you can give it a shot.

You can take a backup of the namespace/set without bin data (-x option). This will dump all the records with their metadata including generation. The hottest keys will tend to have very high generation number. You should be able to catch a good number of hot keys. There is a risk of missing some hotkeys as the generation will wrap around 65535. If you take multiple samples, you should be able to catch a fairly good number of hot keys.

Another way is to enable debug logging for thr_rw module. Be careful as this will print lot of stuff in the error log file. We print the first 8 bytes of the digest. You can write a script to catch them and find the most commonly occurring ones.


#3

@sunil,

This makes sense for hot writes, but my use case is infrequent writes with a lot more frequent reads – for reads the gen won’t change so this won’t help.

I was thinking of periodically running tcpdump and trying to get something out of there but it’s hard to parse that output


#4

You are right. The backup idea will not work for reads. But the debug logging of the thr_rw module can work for reads. Parsing this can be easier than parsing the tcpdump output as the aerospike messages will be more human friendly.