The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
Aerospike node asserts and shuts down with ‘Too Many Chunks’ during migration
Problem Description
An Aerospike node with an all-flash configured namespace (index on disk) shuts down with the following message shown in the log.
Oct 07 2021 12:51:23 GMT: CRITICAL (arenax): (arenax_ee.c:98) too many chunks
Oct 07 2021 12:51:23 GMT: WARNING (as): (signal.c:218) SIGUSR1 received, aborting Aerospike Enterprise Edition build 5.6.0.8 os el7
Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:630) stacktrace: registers: rax 0000000000000000 rbx 000000000000000a [...] 0000000000001926 r13 00007f46bbd67808 r14 00007f4971a00800 r15 000000000bcea660 rip 00007f4983521690
Oct 07 2021 12:51:23 GMT: WARNING (as): (log.c:643) stacktrace: found 11 frames: 0x6862a1 0x4ef4fb 0x7f49835217e0 0x7f4983521690 0x685a67 0x666ae7 0x4abdbc 0x4abe18 0x6746a7 0x7f498351740b 0x7f49820c50bf offset 0x0
Explanation
This error occurs when there is a serious misconfiguration of partition-tree-sprigs
on the all-flash namespace.
A sprig
is a branch of the primary index. When the index is held in RAM the number of branches is usually unimportant until the index becomes very large. Increasing the number of sprigs
increases index efficiency at the expense of memory consumption.
When the index is on disk (all-flash) the number of sprigs
becomes much more important. Indeed, for such configuration, as disk access would typically consist of 4 KiB blocks reads, sprigs would ideally be fully contained in a single 4 KiB disk block (chunk) in order for a record lookup to consist of 1 disk I/O and no more.
When sizing all-flash installations, care is taken to estimate the size of the index required and to calculate the right number of partition-tree-sprigs
such that all index entries are stored at the desired ‘fill fraction’ (see below) and that each sprig
uses a single 4 KiB chunk and no more.
This is not enforced. There is no limit in the code to the number of chunks per sprig
. It is possible, though deeply inadvisable, to have sprigs
consist of many 4 KiB chunks. This means that for each record lookup there would be, potentially, a large number of disk I/O operations, which would be extremely detrimental to performance. The system will allocate as many chunks as it needs to store the records that are loaded in. The number of chunks is not directly configurable. The number of partition-tree-sprigs
is used as an indirect control.
The error above occurs when a partition is dropped where the sprigs
have more than 100 chunks. When the chunks are cleaned up for re-use there is a sanity check and if the sprig
has more than 100 chunks it is assumed that the sprig is corrupt and the node shuts itself down.
Solution
If this error is observed, it is indicative of a major misconfiguration. The sizing should be reviewed carefully, if need be with an Aerospike Solutions Architect before nodes are restarted,
Notes
- The Fill Fraction defines the level to which a
sprig
is filled to allow for some expansion without overfilling and consuming more than one chunk persprig
- The Linux Capacity Planning Guide gives details on how to size all-flash installations correctly.
Applies To
Server 4.2.0.2 or later.
Keywords
ALL-FLASH MIGRATE CHUNKS TOO MANY
Timestamp
October 2021