As we know, the key is hashed to a fixed 20 bytes string with RIPEMD160. Then choose the first 12 bits of the string(key’s digest) as partition id. I want to figure out that how the hash algorithm RIPEMD160 ensure the keys are evenly distributed on all 4096 partitions. Because partitions are evenly distributed on the cluster nodes.
RIPEMD160 hash generates fairly random bits based on the key. Randomness ensures even distribution.
Any set of bits can be picked to determine the partition ID. We just happen to pick first 12 bits.
So it is RIPEMD160‘s characteristic that decides a large number of keys will be evenly distributed on 4096 partition. For example, there are 4096000 keys, so every partition has nearly 1000 keys. Is it correct?
Another question is as follows. In server code, I notice there is a phase called ‘Partition Map’. You guys implemented this using FNV-1a hash and One-at-a-Time hash. After partition map, it seems that partitions are also evenly disteibuted on all cluster nodes. So it is these two hash functions that decide it?
I will appreciate for your patience answers.
That is right !!
I wonder how you find these three hash functions to solve distributed problem. but actually they work.