As the doc says, aerospike will using a hash algorithm on the key to get a hash key for a Record. What will happend when two records has the same hash key(hash collision)? Will aerospike server notice that there are more than one records under the same hash key and compare the original key to detect with is the real expected record? If so, the performance for duplicated hash key records will always be lower…
Aerospike hash functions are based on RIPEMD-160, that is known to be secure against collision based attacks. The actual probability of collision is infinitesimally small (or non-existent in practice) for billions of keys. Here is some research on whether RIPEMD-160 is collision free:
On the Collision Resistance of RIPEMD-160: https://online.tugraz.at/tug_online/voe_main2.getvolltext?pCurrPk=17675
Depending on the robustness of RIPEMD-160 hash function, the user defined key is not stored on the server by default. Instead, the user key is converted to a hash digest which is then used to identify a record. If the user key needs to persist on the server, use one of the following methods:
- Ask the database to explicitly store key and detect collisions (e.g., in Java, set WritePolicy.sendKey to true). In this case, the key will be sent to the server for storage on writes and retrieved on multi-record scans and queries. Collisions will be detected and a log message printed. If you see one of these messages on key collisions, please report this. And, you may be able write a research paper based on the two keys that caused the collision!
- Explicitly store and retrieve the user key in an application defined bin.
Hope this helps.
Thanks for your detail explanation.