An Aerospike node with an all-flash configured namespace (index on disk) shuts down with the following message shown in the log.
Oct 07 2021 12:51:23 GMT: CRITICAL (arenax): (arenax_ee.c:98) too many chunks Oct 07 2021 12:51:23 GMT: WARNING (as): (signal.c:218) SIGUSR1 received, aborting Aerospike Enterprise Edition build 184.108.40.206 os el7 Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:630) stacktrace: registers: rax 0000000000000000 rbx 000000000000000a [...] 0000000000001926 r13 00007f46bbd67808 r14 00007f4971a00800 r15 000000000bcea660 rip 00007f4983521690 Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:643) stacktrace: found 11 frames: 0x6862a1 0x4ef4fb 0x7f49835217e0 0x7f4983521690 0x685a67 0x666ae7 0x4abdbc 0x4abe18 0x6746a7 0x7f498351740b 0x7f49820c50bf offset 0x0
This error occurs when there is a serious misconfiguration of
partition-tree-sprigs on the all-flash namespace.
sprig is a branch of the primary index. When the index is held in RAM the number of branches is usually unimportant until the index becomes very large. Increasing the number of
sprigs increases index efficiency at the expense of memory consumption.
When the index is on disk (all-flash) the number of
sprigs becomes much more important. Indeed, for such configuration, as disk access would typically consist of 4 KiB blocks reads, sprigs would ideally be fully contained in a single 4 KiB disk block (chunk) in order for a record lookup to consist of 1 disk I/O and no more.
When sizing all-flash installations, care is taken to estimate the size of the index required and to calculate the right number of
partition-tree-sprigs such that all index entries are stored at the desired ‘fill fraction’ (see below) and that each
sprig uses a single 4 KiB chunk and no more.
This is not enforced. There is no limit in the code to the number of chunks per
sprig. It is possible, though deeply inadvisable, to have
sprigs consist of many 4 KiB chunks. This means that for each record lookup there would be, potentially, a large number of disk I/O operations, which would be extremely detrimental to performance. The system will allocate as many chunks as it needs to store the records that are loaded in. The number of chunks is not directly configurable. The number of
partition-tree-sprigs is used as an indirect control.
The error above occurs when a partition is dropped where the
sprigs have more than 100 chunks. When the chunks are cleaned up for re-use there is a sanity check and if the
sprig has more than 100 chunks it is assumed that the sprig is corrupt and the node shuts itself down.
If this error is observed, it is indicative of a major misconfiguration. The sizing should be reviewed carefully, if need be with an Aerospike Solutions Architect before nodes are restarted,
- The Fill Fraction defines the level to which a
sprigis filled to allow for some expansion without overfilling and consuming more than one chunk per
- The Linux Capacity Planning Guide gives details on how to size all-flash installations correctly.
Server 220.127.116.11 or later.
ALL-FLASH MIGRATE CHUNKS TOO MANY