Summary of Secondary Indexes



What they are:

A secondary index is a definition for a value in a bin. It is an index that you use to query a value other than a primary key. You can define a secondary index on the values in a single bin in a record set. A secondary index has a datatype, even though the value in the bin may or may not match the datatype.

While both primary and secondary indexes live in memory, primary indexes live in shared memory (Enterprise version only), and secondary indexes live in process memory. This means that if the Aerospike process is stopped and restarted, it can re-attach to the existing primary indexes in shared memory. If you have defined one or more secondary indexes, they must be rebuilt each time the node is started. Secondary indexes do not support fast-start.

A secondary index can be either a string or a number. If a record contains a string value in the bin, but the secondary index is defined as a number, that record is not included in the secondary index.

If you query for a secondary index that is defined as a string, but the bin contains a number, the query ignores the numeric value.

How they are created:

When you restart a node, the node rebuilds the primary indexes, as needed. If you have defined secondary indexes for this cluster, the node must rebuild the secondary indexes before it rejoins the cluster. When you either create or rebuild a secondary index, the server uses the definitions in /opt/aerospike/smd to build the secondary indexes.

When the node rebuilds the secondary indexes, it performs the following steps:

  1. The node scans the primary indexes that are in RAM.

  2. The node reads each record from disk.

  3. If the record contains the indexed bin and the bin is of the correct datatype, the node adds the bin to the secondary index. If the bin does not exist, or it is not of the correct datatype, the node skips it.

  4. If the record does match the definition of the secondary index, it writes a digest for that value in process memory for the secondary index.

  5. When you query that bin, the secondary index returns the values. You can query for values that have a one-to-one or one-to-many relationships with each other.

How to check and speed up Secondary index creation or re-building?