The asd process cold starts after a dirty exit, which appears to be clean

The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.

The asd process cold starts after a dirty exit, which appears to be clean

Problem Description

When starting asd after a successful process shutdown, with clear logs, asd always cold starts for no apparent reason.

Note that this only applies for Enterprise Edition as Community Edition will always cold start. Please also note that there are cases where asd cold starts after a graceful shutdown is normal. Such cases include:

  • shared memory cleared manually, or server rebooted
  • aerospike manually cold started
  • a disk (or disks) has been cleared and is empty
  • when using data-in-memory true on Aerospike versions below 3.15.1.3

Explanation

A typical shutdown log sequence looks like the following:

Jan 25 2021 23:46:14 GMT: INFO (as): (signal.c:203) SIGTERM received, shutting down Aerospike Enterprise Edition build 5.4.0.2 os ubuntu18.04
Jan 25 2021 23:46:14 GMT: INFO (as): (as.c:435) initiating clean shutdown ...
Jan 25 2021 23:46:14 GMT: INFO (hb): (hb.c:5822) closing mesh heartbeat sockets
Jan 25 2021 23:46:14 GMT: INFO (storage): (storage.c:246) {test} partitions shut down
Jan 25 2021 23:46:14 GMT: INFO (storage): (storage.c:251) {test} storage flushed
Jan 25 2021 23:46:14 GMT: INFO (namespace): (namespace_ee.c:159) {test} persisted arena stages
Jan 25 2021 23:46:14 GMT: INFO (namespace): (namespace_ee.c:208) {test} persisted tree roots
Jan 25 2021 23:46:14 GMT: INFO (namespace): (namespace_ee.c:238) {test} persisted trusted base block
Jan 25 2021 23:46:14 GMT: INFO (storage): (storage.c:235) {bar} storage-engine memory - nothing to do
Jan 25 2021 23:46:14 GMT: INFO (as): (as.c:446) finished clean shutdown - exiting

From 4.6 to 5.2 it would be more like:

Jul 17 2019 18:30:47 GMT: INFO (as): (signal.c:198) SIGTERM received, shutting down Aerospike Enterprise Edition build 4.6.0.2 os el6
Jul 17 2019 18:30:47 GMT: INFO (as): (as.c:446) initiating clean shutdown ...
Jul 17 2019 18:30:47 GMT: INFO (storage): (storage.c:826) {test} partitions shut down
Jul 17 2019 18:30:47 GMT: INFO (storage): (storage.c:831) {test} storage devices flushed
Jul 17 2019 18:30:47 GMT: INFO (namespace): (namespace_ee.c:159) {test} persisted arena stages
Jul 17 2019 18:30:47 GMT: INFO (namespace): (namespace_ee.c:208) {test} persisted tree roots
Jul 17 2019 18:30:47 GMT: INFO (namespace): (namespace_ee.c:238) {test} persisted trusted base block
Jul 17 2019 18:30:47 GMT: INFO (storage): (storage.c:837) {bar} storage-engine memory - nothing to do
Jul 17 2019 18:30:47 GMT: INFO (xdr): (xdr_serverside.c:161) XDR stopped.
Jul 17 2019 18:30:47 GMT: INFO (as): (as.c:451) finished clean shutdown - exiting

For versions prior to version 4.6, it would be along these lines:

Jul 17 2019 18:28:23 GMT: INFO (as): (signal.c:195) SIGTERM received, starting normal shutdown
Jul 17 2019 18:28:23 GMT: INFO (storage): (storage.c:815) initiating storage shutdown ...
Jul 17 2019 18:28:23 GMT: INFO (storage): (storage.c:816) flushing data to storage ...
Jul 17 2019 18:28:23 GMT: INFO (storage): (storage.c:835) completed flushing to storage
Jul 17 2019 18:28:23 GMT: INFO (xdr): (xdr_serverside.c:161) XDR stopped.
Jul 17 2019 18:28:23 GMT: INFO (as): (as.c:449) finished clean shutdown - exiting

Typically, once Aerospike is stopped, and the last log line is printed, the asd process will still run for a short while in order to clear the memory. The last log line in question for successful shutdown is:

Jul 01 2019 08:16:54 GMT: INFO (as): (as.c:449) finished clean shutdown - exiting

In some cases, the shutdown process can take longer.

  • When a large number of sprigs has been configured, this process may take a while as sprigs root have to be re-written in shared memory. The log message that would be delayed in getting printed is ‘{namespace} persisted tree roots’.

  • When using pmem index-type, as the CPU cache has to be persisted on shutdown. The log message that would be delayed in getting printed is ‘{namespace} persisted arena stages’.

The systemd service has a default timeout for how long to allow a process to exit after issuing SIGTERM, this is set to 10 minutes for server version 4.6 and above. For server version prior to 4.6, this timeout was set to 15 seconds. After this timeout, systemd will issue a SIGKILL.

Solution

If the timeout setting was changed manually or if you are using server version prior to 4.6, the systemd configuration can be changed to allow more time for the process to exit cleanly.

The /usr/lib/systemd/system/aerospike.service should be edited to add/edit the following line under [Service] heading. This configures the timeout prior to the SIGKILL, the value in the example below sets this to 900 seconds.

TimeoutSec=900

Notes

  • The example shown above uses 900 seconds as a timeout value. In reality this is excessively high. To determine the correct value an initially high value should be chosen, the process should be stopped and the pidof asd command should be used to check the correct time required to shutdown the asd process.
  • Documentation on cold start.

Keywords

ASD SYSTEMD COLDSTART

Timestamp

January 2021

1 Like