FAQ - Why does index disk usage climb so rapidly with all flash?

FAQ - Why does index disk usage climb so rapidly with all flash?

Detail

When using the All-Flash option (index-type flash), where the primary index is stored on disk rather than in RAM, the amount of disk used seems to climb rapidly, disproportionate to the number of records being stored. Why is this?

Answer

The reason why the disk usage for the primary index seems to climb in a manner disproportionate to the number of records is because the disk usage is related to the number of partition-tree-sprigs created to store index entries. A partition-tree-sprig is typically a 4 KiB disk block used to store up to 64 record digests (each record digest requires 64 B).

In a correctly sized all flash installation the disk usage will climb to the projected figure calculated as follows:

(4 KB * (total unique records / (64 x fill fraction)))

As records flow into the system, they get assigned to the partition / sprig pair they belong to (based on their digest hash). It is not intended for sprigs to fill up as if they overflow there are two key consequences:

  • A given record operation will involve 2 disk IOPS rather than 1 which will impact latency.
  • The sprig will take up more than a single 4 KB block which will impact sizing calculations.

For this reason a fill factor is defined which represents how full the sprig is expected to be when the system is operated at expected capacity.

For this reason the disk usage as a function of number of records is not linear. In fact, during the initial data load, a new sprig will be instantiated on disk almost for each new record entering the system. As a result, the disk usage will increment in 4 KB chunks until all sprigs have been created. The disk usage as a function of number of records will look similar to the following curve.

alt text

Once all sprigs have been instantiated, the disk usage will flat line. From here the sprigs sprigs will fill to the desired fill factor.

It is important to note that the index_flash_used_bytes does not represent the amount of flash storage used by the sprigs. It rather represents the amount of storage used by the records, regardless of the number of sprigs. It can simply be calculated using the usual formula: 64 B x number of records.

Notes

  • Information on how to size an All-Flash index can be found on the All-Flash Capacity Planning documentation.
  • Refer to the All-Flash Configuration documentation for configuration related details.
  • When the primary index is on disk there is still a 13 B memory overhead but this is per sprig.

Keywords

ALL FLASH INDEX ON DISK DISK USAGE

Timestamp

April 2020

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.