The Aerospike Knowledge Base has moved to https://support.aerospike.com. Content on https://discuss.aerospike.com is being migrated to either https://support.aerospike.com or https://docs.aerospike.com. Maintenance on articles stored in this repository ceased on December 31st 2022 and this article may be stale. If you have any questions, please do not hesitate to raise a case via https://support.aerospike.com.
Cluster in stop-writes but UDFs continue writing
Problem Description
In Aerospike releases prior to Aerospike 3.8.0 a node has hit stop-writes however avail-pct continues to decrease. Analysis of the logs shows that UDFs are still executing and that the writes are being accepted by the database.
Explanation
There is a bug in Aerospike releases prior to 3.8.0 whereby UDF writes, will go through even when a database is in stop-writes. The aerospike.log will show:
aerospike.log-20160405:Apr 05 2016 06:10:47 GMT: WARNING (namespace): (namespace.c::457) {problem_namespace} hwm_breached false, stop_writes true (disk avail pct), memory sz:20890709632 (20890709632 + 0) hwm:34789236736 sw:52183851008, disk sz:777470015488 hwm:896183828480
Using either asadm in analysis mode or asloglatency we can see that UDFs are still executing.
Admin> loglatency -h udf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~cluster~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : s1 . . . |
. : % >1ms % >8ms % >64ms ops/sec |
Apr 05 2016 06:10:17: 4.38 0.06 0.00 360.8 |
Apr 05 2016 06:10:27: 5.92 0.06 0.00 319.0 |
Apr 05 2016 06:10:37: 5.53 0.12 0.00 403.6 |
Apr 05 2016 06:10:47: 3.48 0.02 0.00 448.6 |
Apr 05 2016 06:10:57: 5.93 0.06 0.00 330.7 |
Apr 05 2016 06:11:07: 6.44 0.20 0.00 295.1 |
Apr 05 2016 06:11:17: 5.06 0.00 0.00 385.6 |
Apr 05 2016 06:11:27: 8.98 0.07 0.00 405.4 |
Apr 05 2016 06:11:37: 6.66 0.14 0.00 437.2 |
Apr 05 2016 06:11:47: 5.11 0.05 0.03 381.7 |
Apr 05 2016 06:11:57: 6.67 0.00 0.00 316.2 |
Apr 05 2016 06:12:07: 5.71 0.04 0.00 276.8 |
Log messages indicate that these UDFs are writing data as the w-tot count for the devices on the namespace is increasing.
aerospike.log-20160405:Apr 05 2016 06:16:50 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /dev/xvdb: used 388800305152, contig-free 38406M (76813 wblocks), swb-free 16, w-q 0 w-tot 124912121 (17.0/s), defrag-q 0 defrag-tot 124321385 (16.6/s) defrag-w-tot 57987003 (7.9/s)
aerospike.log-20160405:Apr 05 2016 06:17:00 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /dev/xvdc: used 388722930048, contig-free 37535M (75071 wblocks), swb-free 16, w-q 0 w-tot 126356071 (15.6/s), defrag-q 0 defrag-tot 125706130 (14.6/s) defrag-w-tot 58184806 (6.9/s)
aerospike.log-20160405:Apr 05 2016 06:17:10 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /dev/xvdb: used 388801634048, contig-free 38376M (76752 wblocks), swb-free 16, w-q 0 w-tot 124912419 (14.9/s), defrag-q 0 defrag-tot 124321650 (13.2/s) defrag-w-tot 57987129 (6.3/s)
aerospike.log-20160405:Apr 05 2016 06:17:20 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /dev/xvdc: used 388724107264, contig-free 37511M (75023 wbl
Solution
This is a bug within Aerospike and has been resolved in Aerospike releases 3.8.x and higher. To resolve this issue the product should be upgraded.
As a temporary measure, shutting down all UDFs should be included in operational process if a stop-write condition occurs.
Notes
- This is tracked under JIRA AER-4907
- The asadm tool includes rich functionality for log and latency analysis in addition to a comprehensive command line administrative interface. It can be cloned from the following Git repository. GitHub - aerospike/aerospike-admin: Aerospike Administration tool: allows operations to view vital stats from the aerospike server from the command line.
Keywords
UDF STOP-WRITES
Timestamp
5/6/16