ASD fails to start after a cold restart due to set limit breach on a namespace

ASD fails to start after a cold restart due to set limit breach on a namespace

Problem Description

ASD fails to start up after a cold restart when the set limit for a namespace (1023) is breached, this happens due to the resurrection of non durably deleted records by cold restart.

In the aerospike.log, the following error is observed during cold start.

Mar 19 2021 19:48:51 GMT: WARNING (namespace): (namespace.c:332) can't add test (at sets limit)
Mar 19 2021 19:48:51 GMT: CRITICAL (xdr): (dc.c:874) {test} failed to create set
Mar 19 2021 19:48:51 GMT: INFO (xdr-client): (cluster.c:753) starting with seed nodes for dest1
Mar 19 2021 19:48:51 GMT: WARNING (as): (signal.c:218) SIGUSR1 received, aborting Aerospike Enterprise Edition build 5.5.0.3 os el7

Explanation

The error CRITICAL (xdr): (dc.c:874) {test} failed to create set indicates that during the cold restart, the set limit on the namespace has breached. At this point new sets cannot be created and the cold start fails.

Solution

In a normal, situation if the set limit breaches while the asd is running then the asd will just fail to insert the record that would have created the set.

To overcome this situation, unused sets can be removed.

If the set limit breaches during the cold restart, i.e when asd is still starting up, erase the data on the disks prior to restarting the asd daemon. The data will be repopulated through migrations without resurrecting older records from the storage subsystem. Alternatively, one can set the cold-start-empty configuration parameter to true for the namespace breaching the set limit. This parameter must be set in the aerospike.conf file. Remember to update the configuration accordingly after the restart to avoid subsequent cold starts to ignore the stored data as that could be undesirable in some other situations.

Notes

  • Aforementioned behaviour where resurrection of deleted records by cold start won’t happen when running the Enterprise Edition if the records are deleted using truncate or durable deletes.

  • When cold-start-empty is set to ‘true’, it is necessary to wait for migrations to complete before going to the next node.

  • The cold-start-empty configuration parameter should only be used when the replication factor is 2 or more on the namespace.

Keywords

COLD START SET LIMIT RESURRECTED BREACHED

Timestamp

April 2021

© 2021 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.