This article explains scenarios which would lead to stop write situations in Aerospike.
The Aerospike database server has some mechanisms to protect against running out of memory or disk space. The eviction algorithm evicts (i.e. accelerate expiration) data once the high-water-mark is breached for disk or memory on the node. And defragmentation takes place constantly to be able to reclaim storage which is no longer needed by the SSD.
For more information on configuring the limits, please see: Namespace Retention configuration
Situation for stop writes on a cluster
In case of a multi-node cluster, if stop-writes triggers on the node with the master copy of the data/partition, the writes will fail.
In case the node which has hit stop-writes is supposed to be the replica partition node, the write is allowed for the replica data. Writes are also allowed for data that is incoming as a part of ongoing migration (data rebalancing).
For server version prior to 3.15, the stop_writes flag will wait for the ongoing Namespace Supervisor cycle to complete, thus might take longer to set itself to true depending on how long namespace supervisor cycle takes (which in turn depends on number of records, number of namespaces, number of eligible records for expiration/eviction, system performance and other related factors).
Situations for stop writes on a node
The server is designed to stop writes on the disk (and the memory) if any of the following are breached:
- Memory utilization is above a certain threshold (
- Available Percentage on the disk goes below a certain threshold (
- Defragmentation is not able to keep up with the number of objects evicted.
- Eviction is not able to keep up.
Writes will start to fail completely when stop writes situation is encountered. In data-in-memory configuration, stop-writes triggered by any reason (either disk or memory being full) will be honored by both. This behavior is per node (since stop writes configuration is defined per node and not to a cluster).
- As mentioned previously, in case the node which has hit stop-writes is supposed to be the replica partition node, the write is allowed for the replica data.
- Migrations writes will also proceed on a node which has hit stop-writes.
Manage Storage Capacity: http://www.aerospike.com/docs/operations/manage/storage
Steps to recover from stop-writes
To recover from memory utilization capping , please see
To recover from minimum available percentage going to 0, please see
To understand the Defragmentation configuration parameters, please see
To recover from evictions not keeping up, please see Reference: evict-tenth-pct
This fraction needs to be tuned so it’s big enough to allow eviction to keep pace with the rate at which new data is added to the namespace, but small enough so eviction is as “smooth” as possible.
stop write stop-writes