How can one go about analyzing which are the hottest N keys that are being queried and/or updated? I’m aware of Hot Key error code 14 but that might be a bit too late (and also only applies to updates) - how can one do it proactively?
Btw, I am aware that keys should be random and evenly distributed but it always happens that there are some weird ones out there that we end up having to filter - e.g. empty strings, “null”, “N/A”, etc - things that are most likely invalid but unless we find them we can’t filter them out.
This may not be so elegant, but you can give it a shot.
You can take a backup of the namespace/set without bin data (-x option). This will dump all the records with their metadata including generation. The hottest keys will tend to have very high generation number. You should be able to catch a good number of hot keys. There is a risk of missing some hotkeys as the generation will wrap around 65535. If you take multiple samples, you should be able to catch a fairly good number of hot keys.
Another way is to enable debug logging for thr_rw module. Be careful as this will print lot of stuff in the error log file. We print the first 8 bytes of the digest. You can write a script to catch them and find the most commonly occurring ones.
This makes sense for hot writes, but my use case is infrequent writes with a lot more frequent reads – for reads the gen won’t change so this won’t help.
I was thinking of periodically running tcpdump and trying to get something out of there but it’s hard to parse that output
You are right. The backup idea will not work for reads. But the debug logging of the thr_rw module can work for reads. Parsing this can be easier than parsing the tcpdump output as the aerospike messages will be more human friendly.