Stop Write on batch loading

fryifta · July 27, 2023, 11:39pm

Hi, Stop write is triggered with not apparent reason. I’m not sure what is going on as there is plenty of memory and storage available. This is triggered once I try to load data in a batch. Thanks for the help!

single namespace: namespace udb { replication-factor 2 memory-size 30G high-water-memory-pct 85

    storage-engine device {
        file /opt/aerospike/data/udb.dat
        data-in-memory true
        write-block-size 128K
        filesize 50G
    }

}

pgupta · July 28, 2023, 4:49am

Can you check the server log for the node that is show stop writes true and see if you have hit clock skew stop write? All the server nodes should be sync’d to NTP server clock and their clocks should not drift beyond 40 seconds in AP mode or you will hit stop writes.

fryifta · July 28, 2023, 2:08pm

Thanks for the response, servers are using NTP so its not the time drift. Probably left out an important detail, I added 3 nodes (15-17) as the cluster was reaching its limits. While namespace counters showed that it is balanced I noticed that asd process on the existing servers was still holding to the memory and was at 91%. Once restarted asd released the memory and it’s consumption is now more aligned with namespace memory usage. Is this the expected behavior ? (not releasing unused memory until restart)

pgupta · July 28, 2023, 7:46pm

Memory used for data should be released. Memory used for storing Primary Index (64 bytes per record) is not released unless you restart the server and rebuild the Primary Index. (The way arena stage memory allocation works for PI.)

meher · July 31, 2023, 9:52pm

With data-in-memory true, you can also get fragmentation, which can then trigger stop-writes-sys-memory-pct which was introduced in the version you are running (6.3). You can check the heap_efficiency_pct metric. A restart will reload the data in memory and get the heap_efficiency_pct back to close to 100… Stay tuned for some big changes in this area (data in memory) later this year.

system · July 30, 2024, 9:53pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memory errors Operations	6	3925	July 2, 2017
System_free_mem_pct below what AMC shows Operations	9	2013	July 26, 2017
Memory usage/ leak (bug)	17	5140	August 19, 2015
Aerospike server is getting killed because of "out of memory". Math not working out	2	2593	February 17, 2016
Restore cluster trouble	2	1364	May 27, 2016

Stop Write on batch loading

Related topics