How is Aerospike primary key organized to iterate Set effectively?

deepnighttwo · January 5, 2015, 1:14pm

I went though the online documents of Aerospike official website. One thing I didn’t figure out is how the primary key organized to support Set iteration?

As the document mentioned, a Namespace(NS) is created with 4096 partitions and a Record will be assigned to one of these partitions based on the hash value of Record key. Based on these, Set is like a logical concept than a physical. So records of a Set will be distributed to 4096 partitions of NS.

And the document says aerospike’s primary index is a RB-Tree. The key is digit surely and the value is index metadata(void time, write gen and storage addr). And Set is not mentioned here.

The document also says Entire keyspace in a set (table) is partitioned using a robust hash function into partitions. My guess is

for each partition, each Set of a NS has a individual RB-Tree as index. That means there are 4096 * NumberOfSet RB-Trees to form the whole primary index of a NS.
For each server, each Set of a NS has a individual RB-Tree as index. And in this way, the primary key of an NS should contains NumberOfClusterNode * NumberOfSet RB-Trees.

Which one is true or neither is?

wchu · January 6, 2015, 3:37am

Setname is simply part of the key. It is (prepended) as part of the key, before the key is hashed using RIPEMD160.

There are 4096 RB trees per namespace. Nothing separate for each set.

Will be modifying the doc to make it clearer.

Thanks.

deepnighttwo · January 6, 2015, 5:19am

If in this way, how could aerospike scan a set effectively?

Suppose there is a NS has 1b records and there are two Sets. One set named “less” has only 1m records. To scan the Set, based on the primary index structure, how can aerospike only iterate the 1m records I needed but avoid to filter the full 1b records of the NS?

deepnighttwo · January 6, 2015, 9:37am

I did a test on set iteration using

    Statement stmt = new Statement();
    stmt.setNamespace("persistusers30d");
    stmt.setSetName("userstempset");
    long start = System.currentTimeMillis();
    RecordSet rs = client.query(null, stmt);
    try {
        while (rs.next()) {
            Key key = rs.getKey();
            System.out.println(key.toString() + "\t" + key.userKey);
            Record record = rs.getRecord();
            System.out.println(record.getValue("username"));
            System.out.println(record.getValue("interests"));
        }
    } finally {
        rs.close();
    }
    System.out.println(System.currentTimeMillis() - start);

It takes about 8s to finish. The set userstempset contains one record only. NS persistusers30d contains 27,008,609 records. Based on the primary index structure, there is no set included and the set name is not used to do filter. Really wanna know how aerospike be able to do this so fast. Where the Set name in the statement is used to do the filtering?

wchu · January 7, 2015, 6:01pm

Even though sets do not have their own partition trees, the set information is still saved as an integer in the record’s index structure.

When doing a scan to search for all data in a set, the index tree is iterated and only those matching the set will be returned.

An alternative is to create secondary index on the interested set, on an integer bin. Then a full range query will return all data in the set.

deepnighttwo · January 9, 2015, 5:46am

The 20 byte digit is hash (set + record key) or hash(set name) + hash (record key)?

If it is hash(set name) + hash (record key), I believe scan the index tree is very fast based on hash(set name).

Is there any document I can refer to?

Thanks!

wchu · January 9, 2015, 6:11am

It is hash (set+record key)

Please see computDigest() in Aerospike’s Java Library

github.com

aerospike/aerospike-client-java/blob/master/client/src/com/aerospike/client/Key.java

/*
 * Copyright 2012-2021 Aerospike, Inc.
 *
 * Portions may be licensed to Aerospike, Inc. under one or more contributor
 * license agreements WHICH ARE COMPATIBLE WITH THE APACHE LICENSE, VERSION 2.0.
 *
 * Licensed under the Apache License, Version 2.0 (the "License"); you may not
 * use this file except in compliance with the License. You may obtain a copy of
 * the License at http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
 * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
 * License for the specific language governing permissions and limitations under
 * the License.
 */
package com.aerospike.client;

import java.util.Arrays;

This file has been truncated. show original

deepnighttwo · January 9, 2015, 9:21am

In this way, how to do filtering in primary index tree just using a Set Name?

wchu · January 9, 2015, 5:28pm

Even though the setname is being hashed together with the key, the information is still separately sent on the wire protocol, and remembered separately as part of the record. This is why filtering based on setname is still possible.

deepnighttwo · January 10, 2015, 2:44am

So the cost to iterate a set is Went though Primary Index with Set Name as filter + read expected record from storage, right?

wchu · January 10, 2015, 5:28pm

yes, this is correct!

deepnighttwo · January 12, 2015, 7:17am

One more thing to make sure. Does aerospike iterate the whole primary index to filter set name or there is some search strategy to skip (Since this primary index is a RB tree…)?

wchu · January 12, 2015, 8:59pm

It is iterating the whole primary index to get the records matching the set As a side note, Aerospike iterates through the whole primary index periodically to expire data as well.

deepnighttwo · January 12, 2015, 11:27pm

many thanks for your detail explanation.

Topic		Replies	Views
Want to know some details about set Aerospike Terminology	3	3255	January 8, 2015
Primary key iterator and seek functions sorely missed secondary , index	5	1795	August 29, 2017
How does Aerospike hash the key to store on the cluster?	4	2028	June 24, 2020
Primary Index	7	3082	February 6, 2018
Use of the primary index C# Client index	3	1116	June 18, 2020

How is Aerospike primary key organized to iterate Set effectively?

Related topics