FAQ - All Flash

Title: FAQ - All Flash

As of server version 4.3, Aerospike can be configured to store the Primary Index on NVMe Flash devices. This article covers some frequently asked questions regarding this feature.

1. For the storage device, can the same mount be used to store the data or does it have to be separate?

The devices for storing the Primary Index and the Data should be different.

1. For the storage device, can it be the same mount that is used to store the data or has to be separate?

It should be on different SSDs. Additionally, we do recommended NVMe SSD drives.

2. Can two namespaces share the same location to store indexes? For example, if a drive has more capacity than needed for one namespace’s indexes.

There may be more than one mount per namespace. Although not recommended, a mount may be shared with other namespaces. For sizing details when using index-type flash, refer to the Capacity Planning page. Refer to the mount configuration reference for further details.

Mounts can be shared across namespaces. This is because a mount is a directory, and the actual files are the index arena stages, and the names of these files have the namespace (and instance) IDs built in. So files for different namespaces and instances can coexist in a directory. For example, namespace1 uses mount /mnt/nvme with size 4GiB and namespace2 also uses mount /mnt/nvme with size 8GiB (assuming nvme is at least 12 GiB).

Since sharing is possible, there is a configuration item to indicate index device quotas for namespaces by mounts-size-limit. This limit is enforced only via eviction, for which there is a configurable threshold – mounts-high-water-pct. While mounts-size-limit is not a hard limit (a namespace doesn’t have to do expiration and eviction) it must be configured anyway. The minimum allowed is 4 GiB, and the maximum can’t exceed the actual space available on the mounts.

Note that while sharing mounts across namespaces is possible, it is not recommended. It may instead be beneficial for performance to use multiple mounts (and underlying devices) for one namespace.

3. How do I monitor space used for the index?

It’s essential to understand that All Flash configurations pre-allocate the index space. For example, an 8-node cluster configured with 32768 partition-tree-sprigs (sprigs per partition) and replication factor of 2 would mean that the sprigs themselves would need 128GiB of index device space on each node:

4096 (num. of partitions) * 2 (rep. Factor) * 32,768 (num. of sprigs per partition) * 4KiB) / 8 (num. of nodes) = 128GiB)

And the amount of memory consumed by 32,768 primary index sprigs on the same namespace is 3.25GiB for a cluster running on the Enterprise Edition (each sprig has an overhead of 13 bytes and a namespace has 8192 partitions for a replication factor of 2).

Statistics

The statistic to monitor are index_flash_used_pct to see the used percentage and index_flash_used_bytes to see the usage in bytes.

Important note: that those statistics show the usage based on the number of records rather than the number of sprigs instantiated. Initially, sprigs would be instantiated with the first record they would contain, causing the primary index mount to roughly fill up at a pace of 4KiB per record inserted, until all the sprigs configured have been instantiated. The primary index mount usage can be checked directly on the system. Once all the sprigs are instantiated, the primary index disk usage would remain stable until sprigs start reaching their 4KiB initial allocation (they would then hold 64 records) and overflow into a second 4KiB block. This would of course impact performance and would likely require a re-sizing.

Server Logs

There is a INFO log entry for index-flash-usage for each namespace:

{ns_name} index-flash-usage: used-bytes 5502926848 used-pct 1

This is printed periodically every 10 seconds, for each namespace configured with index-type flash.

Refer to to server log reference documentation for more details on the log line.

4. How important is it to set the partition-tree-sprigs and determining fill-factor based on estimated records/current records?

If the namespace is projected to grow rapidly, a lower fill fraction would be more adequate in order to leave room for future records. Full sprigs will span more than a single 4KiB index block, and will then require more than a single index device read, impacting performance. Modifying the number of sprigs, to mitigate such a situation, requires a cold start to rebuild the primary index. It is therefore essential to determine the adequate fill factor in advance. Refer to the Capacity planning documentation for All-Flash for more details.

Another important point to consider is when the cluster size reduces (planned maintenance, unexpected shut down, network partition splits). The min-cluster-size configuration parameter prevents sub-cluster below the configured minimum size to form, preventing a quick proliferation of sprigs from previously not own partitions to fill up the primary index mounts. Refer to Index Device Space for All-Flash for further details.

If the number of records is not expected to drastically change, a higher fill fraction would help lower the time to reduce the index (for scans, migrations, nsup cycle and other operations that traverse the full primary index). Fuller sprigs mean fewer device read operations for a given number of records – each read simply fetches more records into memory.

5. The configuration documentation states that it requires a feature-key-file, is there an additional cost for this feature?

The All-Flash feature does require a feature-key and has an additional cost. Please contact your Aerospike Account Owner for pricing.

References

Timestamp

September 2019

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.