FAQ - How Keys and digests are used in Aerospike

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

FAQ - How primary keys and digests are used in Aerospike

Synopsis

A primary key uniquely identifies a record in the Aerospike database within a given namespace. A key is a unique identifier which is used to address that record for all operations. So, you use the key in your application, while the digest is used for addressing the record in the database. http://www.aerospike.com/docs/architecture/data-model.html

A primary key is also a data structure/class consisting of fields: namespace, set name, digest, user key. For example in Java we define a Key class as: https://www.aerospike.com/apidocs/java/com/aerospike/client/Key.html

  1. Namespace: This is the namespace the record belongs to.
  2. Set name: The set to which the record associated with this key is present.
  3. Digest: The 160 bits (20 bytes) digest field is generated by hashing the set name and the user key using the RIPEMD-160 hashing function.
  4. User key: This is the value field of the Key structure.

See Upper Sizing Bounds and Naming.

1. If the “put” request contains both a key and a digest, which one of these is used by the client in its “put” method?

If the key object provided by the application to the client contains both a key and a digest, then the client will ignore the provided digest and compute a new digest from the provided key.

2. How expensive is hashing the key into a digest? In a high traffic application, would there be a significant benefit from using the DIGEST over the KEY and not having to Hash the key?

The digest is computed using the RIPEMD-160 hashing function. The cost of generating the hash depends on the type and size of the key value. It would be best to benchmark both approaches with typical key lengths and on the actual production hardware to determine which one performs faster. But in our experience, the time taken for hashing a key to a digest is negligible in the overall time taken for the transaction.

3. Can we get the user Key from the digest?

If a digest is generated using a user key and set name then there is no way to generate the key given the digest and set name. It is not possible to reverse the RIPEMD-160 digest.

4. How do we construct a Key Structure using a digest?

There are APIs on the clients to generate the key structure from a given digest

Java API(http://www.aerospike.com/apidocs/java/com/aerospike/client/Key.html)

public Key(String namespace,
   byte[] digest,
   String setName,
   Value userKey)

Initialize key from namespace, digest, optional set name and optional userKey.

Parameters:

  • namespace - namespace
  • digest - unique server hash value
  • setName - optional set name, enter null when set does not exist
  • userKey - optional original user key (not hash digest), restricted to string, integer or bytes

C API

as_key * as_key_init_digest	(	as_key * 	key,
const as_namespace 	ns,
const as_set 	set,
const as_digest_value 	digest 
)		

Initialize a stack allocated as_key with a digest.

as_digest_value digest = {0};
as_key key;
as_key_init_digest(&key, "ns", "set", digest);

5. How to get the digest for the given key?

Java API

public static byte[] computeDigest(String setName,
                   Value key)
                            throws AerospikeException
Generate unique server hash value from set name, key type and user defined key. The hash function is RIPEMD-160 (a 160 bit hash).

Parameters:

  • setName - optional set name, enter null when set does not exist
  • key - record identifier, unique within set

Returns:

  • unique server hash value

C API

The digest is computed the first time function is called. Subsequent calls will return the previously calculated value.

as_digest * digest = as_key_digest(key);

Parameters:

  • key: The key to get the digest for.

Returns

  • The digest for the key.

6. How do we access a record using a digest?

6.1. We construct a key object using the digest. As we do not know the user key, we can have a null value for it. The set name is also optional.

Key generatedkey = new Key("test_namespace", digest, "testset", null);

6.2. We get the record using the above generated key:

Record generatedRecord = aerpspikeClient.get(writePolicy, generatedkey);

Example:

Key generatedkey = new Key("test", digest, null, null);
System.out.println("The constructed key object:: "+ generatedkey);
Record generatedRecord = aerpspikeClient.get(writePolicy, generatedkey);
System.out.println("The record that is retrieved:: " + generatedRecord);

Output:

The original key object:: test:testset:my_userkey:6feef8cd176660ebbcdd2c87604feb76bbbf64f5
The original digest:: [B@6d3c121b
The original record that is inserted:: (gen:13),(exp:242611414),(bins:(bin2:bin2_value2),(bin1:bin1_value1))


The constructed key object:: test:null:null:6feef8cd176660ebbcdd2c87604feb76bbbf64f5
The record that is retrieved:: (gen:14),(exp:242611414),(bins:(bin2:bin2_value2),(bin1:bin1_value1))

7, How to identify the “User key” from the digest?

As indicated in question 3, it is not possible to reverse the hash.

However, if the client application maintained some kind of dictionary of key structure and its assoicated digest; then one may be able to lookup easily the “User key” abd “Set Name” from the digest.

If the key is stored in the cluster (client polcy have “sendKey” set to true), then one can use ghd “explain select” command in AQL to show the actual “User Key”. Even when the key is not stored, showing the content details of the record may help identify the record.

If the key is stored, then use the below defined UDF to get the “User Key” and “Set Name” for that given digest. Refer to the AQL UDF Management documentation.

function get_details(rec)
	x = "SetName: "

	if record.setname(rec) then
		x = x .. record.setname(rec)
		if record.key(rec) then
			x = x .. ", Key: "
			x = x .. record.key(rec)
		end
	else
		return "[ERROR]-RECORD_NOT_FOUND"
	end

	return x
end

aql> execute filename.get_details() on test where DIGEST="5c4ac062dd2e7848650f4ed505dae88ba9b19856"
+----------------------------+
| get_details                |
+----------------------------+
| "SetName: testset, Key: 2" |
+----------------------------+
1 row in set (0.002 secs)

OK

aql> execute filename.get_details() on test where DIGEST="F59124986E96AD175B374C9487945BBCAD537B74"
+----------------------------+
| get_details                |
+----------------------------+
| "[ERROR]-RECORD_NOT_FOUND" |
+----------------------------+
1 row in set (0.001 secs)

OK

8. How to identify the “Set Name” from the digest?

If the Client Policy has sendKey set to true, refer to previous paragraph, otherwise, one can write a script to loop through all the possible sets in the namespace to find a match.

The following script will provide a list of AQL select commands that can be captured in a file and then be executed in AQL:

#!/bin/sh
# Usage: showdigest.sh <digest>   > cmds.aql
DIGEST=$1
NAMESPACE=test
SERVER_NODE=192.168.xxx.yyy

mydir=$(mktemp -d "${TMPDIR:-/tmp/}$(basename $0).XXXXXXXXXXXX")
trap "rm -rf $mydir" EXIT

sets=`asinfo -h $SERVER_NODE -v "sets" -l | cut -d: -f2 |cut -d= -f2`
for  setname in $sets
do
        rm -f $mydir/cmds.aql

        echo -n 'select * from' >> $mydir/cmds.aql
        echo " ${NAMESPACE}.${setname} where DIGEST='${DIGEST}'" >> $mydir/cmds.aql

        # capture stderr of running aql
        res="$(aql -h $SERVER_NODE -f $mydir/cmds.aql 2>&1 > /dev/null)"

        # print out the aql command that exists for the digest
        if `echo $res | grep -q -v AEROSPIKE_ERR_RECORD_NOT_FOUND`; then
                cat $mydir/cmds.aql
        fi
done

For example:

./showdigest.sh 5a13a9df34dd45eb281b821e35400d39ffaf754a
select * from test.set2 where DIGEST='5a13a9df34dd45eb281b821e35400d39ffaf754a'
select * from test.set3 where DIGEST='5a13a9df34dd45eb281b821e35400d39ffaf754a'
select * from test.set4 where DIGEST='5a13a9df34dd45eb281b821e35400d39ffaf754a'

./showdigest.sh 5a13a9df34dd45eb281b821e35400d39ffaf754a > commands.aql
aql -f commands.aql

Keywords

KEY DIGEST RECORD RIPEMD160 HASH

Timestamp

November 2019

We tried capturing hotkeys digest using tcpdump in real-time. We found out a digest occurring around 90% of total transactions. But we couldn’t able to find out the User key/Set name using any of the above-mentioned methods. Is there any other way to identify that hot key?

Did you follow this guide to identify the digest?

Depending on your version of aerospike, you can also change the logging level for rw-client module which would also print the digest. That may remove any false positive from the tcpdump method.

# Turn detail level logging for rw-client context
asinfo -v "set-log:id=0;rw-client=detail"
# Turn back to info
asinfo -v "set-log:id=0;rw-client=info"

Also did you try the UDF from the above article to determine the set and key? (They original key would only be stored if the client has explicitly enable the SendKEY policy). Were there any corresponding record write failures, like record too big? Or possibly trying to read a non-existing record. (read not found) The write failures from a record too big would have the most impact on your network infrastructure. In both of these cases, the digest and record would not make it to storage and digest would not match an existing record.