Understanding when server no longer accepts writes


#1

Summary

This article explains scenarios which would lead to stop write situations in Aerospike.

Resolution

The Aerospike database server has some mechanisms to protect against running out of memory or disk space. The eviction algorithm evicts (i.e. accelerate expiration) data once the high-water-mark is breached for disk or memory on the node. And defragmentation takes place constantly to be able to reclaim storage which is no longer needed by the SSD.

For more information on configuring the limits, please see: Namespace Retention configuration

Situation for stop writes on a cluster

In case of a multi-node cluster, if stop-writes triggers on the node with the master copy of the data/partition, the writes will fail.

In case the node which has hit stop-writes is supposed to be the replica partition node, the write is allowed for the replica data. Writes are also allowed for data that is incoming as a part of ongoing migration (data rebalancing).

For server version prior to 3.15, the stop_writes flag will wait for the ongoing Namespace Supervisor cycle to complete, thus might take longer to set itself to true depending on how long namespace supervisor cycle takes (which in turn depends on number of records, number of namespaces, number of eligible records for expiration/eviction, system performance and other related factors).

Situations for stop writes on a node

The server is designed to stop writes on the disk (and the memory) if any of the following are breached:

  • Memory utilization is above a certain threshold (stop-writes-pct).
  • Available Percentage on the disk goes below a certain threshold (min-avail-pct).
  • Defragmentation is not able to keep up with the number of objects evicted.
  • Eviction is not able to keep up.

Writes will start to fail completely when stop writes situation is encountered. In data-in-memory configuration, stop-writes triggered by any reason (either disk or memory being full) will be honored by both. This behavior is per node (since stop writes configuration is defined per node and not to a cluster).

Exceptions

  • As mentioned previously, in case the node which has hit stop-writes is supposed to be the replica partition node, the write is allowed for the replica data.
  • Migrations writes will also proceed on a node which has hit stop-writes.

References

Manage Storage Capacity: http://www.aerospike.com/docs/operations/manage/storage

Related article: Namespace stays in stop-writes even though memory is less than stop-writes-pct

Steps to recover from stop-writes

To recover from memory utilization capping , please see

To recover from minimum available percentage going to 0, please see

To understand the Defragmentation configuration parameters, please see

To recover from evictions not keeping up, please see Reference: evict-tenth-pct

This fraction needs to be tuned so it’s big enough to allow eviction to keep pace with the rate at which new data is added to the namespace, but small enough so eviction is as “smooth” as possible.

Keywords

stop write stop-writes

Timestamp

10/02/2017


FAQ What are Expiration, Eviction and Stop-Writes?
ERROR Server is currently in readonly mode. Shutting down
FAQ - Why is high-water-disk-pct set to 50%?
Memory errors