Load data until hwm (High Water Memory) when cold start

xiaosuo · September 11, 2015, 6:57am

My db’s ram size was over the HWM, and after a cold start, some data was lost due to HWM. Is that correct?

BTW: I disabled eviction.

kporter · September 11, 2015, 9:58pm

Correct, during coldstart, Aerospike reads up the the defined high water marks and evicts when the mark is exceeded.

Can you describe how you have disabling evictions?

xiaosuo · September 12, 2015, 7:43am

I have configured asd:

namespace xxx {
  evict-tenths-pct 0
}

At the same time, no ttl is set for every key.

kporter · September 14, 2015, 5:58pm

The typical way to disable eviction is to increase the high-water-memory-pct and [high-water-disk-pct(Configuration Reference | Aerospike Documentation) to 100. After a quick code inspections I suspect setting evict-tenths-pct to 0 could result in an endless loop on coldstart.

If my code inspection is correct, this would be the reason why you didn’t hit that issue.

Could you provide the output of:

asadm -e "show distribution"

And if are you able to provide the logs from the coldstart?

xiaosuo · September 15, 2015, 3:08am

Maybe high-water-memory-pct is better than evict-tenths-pct, since nsup won’t scan all the partitions. But I think we should load data until stop-write other than hwm, as we allow to write more data between hwm and sw, there is no reason to stop it on cold start.

I read the related code again, but failed to find the dead loop.

Although I think there is noting to do with the cluster topology, here is the output

       Percentage of records having objsz less than or equal to
                   value measured in Record Blocks
    Node   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%
in24-140     3     3     4     4     4     5     6     8    12    100
in24-145     3     3     4     4     4     5     6     8    12    100
in24-146     3     3     4     4     4     5     6     8    12    100
in24-147     3     3     4     4     4     5     6     8    12    100
in24-148     3     3     4     4     4     5     6     8    12    100
in24-149     3     3     4     4     4     5     6     8    12    100
Number of rows: 6

I am sorry but the log was rotated and deleted.

kporter · September 16, 2015, 1:16am

The show distribution should have had more output, I was looking for the TTL histogram.

The main reason for this is that the RAM allocated to the primary index is never freed; if we were to read to SW and then evict to HWM, the extra arenas allocated to hold the additional records will not be released. For some deployments, this could be a significant amount of RAM.

xiaosuo · September 17, 2015, 7:43am

~~~~~~~~~~~~~~~impression - TTL Distribution in Seconds~~~~~~~~~~~~~~
        Percentage of records having ttl less than or equal to
                      value measured in Seconds
    Node   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%
in24-140     0     0     0     0     0     0     0     0     0      0
in24-145     0     0     0     0     0     0     0     0     0      0
in24-146     0     0     0     0     0     0     0     0     0      0
in24-147     0     0     0     0     0     0     0     0     0      0
in24-148     0     0     0     0     0     0     0     0     0      0
in24-149     0     0     0     0     0     0     0     0     0      0
Number of rows: 6

I think it is correct as we don’t use any TTL mechanism.

Thanks.

Topic		Replies	Views
Defrag endless loop and hwm breath issue when cold restart Operations	2	1691	June 19, 2017
How does eviction work on restarts if no ttl is set Operations	3	2012	February 18, 2016
Community Edition - Service can't recover after oom when reach the hwm Operations	1	740	September 20, 2019
When Aerospike Community version evict expired record from disk? Deletion deletion , durable-deletion	5	5396	May 20, 2017
Why Aerospike evicted data? Configuration	2	5641	June 12, 2017

Load data until hwm (High Water Memory) when cold start

Related topics