Want to know some details about set


#1

What role is Set playing. NS will decide how data is stored and so on. For each NS, there will be 4k physical partitions. And what is Set playing here? Does Set have any impact on the physical storage? If I use a statement to scan a set as follow:

    Statement stmt = new Statement();
    stmt.setNamespace("persistusers30d");
    stmt.setSetName("users");
    RecordSet rs = client.query(null, stmt);

Will it go though the whole namespace?

And also I noticed that a key returned from aerospike scan using Key key = rs.getKey(); has the set name in its toString method. Since record key is just a digit, where is Set name stored?

This Set thing is confusing me a lot :alien:


#2

Hi Deepnighttwo —

The best way to think of a Set is like a Table (or a Collection in some NoSQLs). They are more dynamic than a Namespace.

You can use Sets when you have Keys that might collide, when you just want to iterate but more efficiently than a secondary index. A Record (the entire key, value, value … ) will exist in only one Set (just like a table in a relational system). Some installations have different Sets for different developers.

The example you have should get all the information in that Set (like a table scan).

You don’t have to create a Set (unlike a namespace, which must have assigned storage) — and unlike a relational system. You can get the list of current Sets through the admin tools.

You can also replicate to a remote datacenter with XDR based on Sets (some replicate, some don’t); Sets are usually how you will configure a secondary index ( Set and Bin ).

To sum up:

Namespace -> "database" Set -> Table Bin -> Column Key -> primary key Record -> Row

We are continually adding more functionality that makes the Set construct richer, such as security roles based on sets.

I hope this helps !


#3

Thanks for your details explanation . But still I wondering how set can do this. All these features need some support from physical storage.

From the document, all record key is hashed to 20B digit (Set name is not mentioned as part of this) and the Primary Key RBTree is build on this. Since this is the key used, how aerospike support to scan using a set name (What role Set name is playing in this)? Where & how Set name is handled in physical store (RAM or hard drive or ssd) level?


#4

We have published all the source code which forms the true reference, so let me give you some hints to reading those components.

The set hashed with the key to create the digest. The id of the set (through a string table) is included in the in-memory structures the full set name and key is stored in persistent storage is configured as such