Restart of aerospike take about 6 days


#1

Hi all. We faced trouble with restarting of aerospike. how does it look:

  • we have namespace which are use storage-engine device

  • as s device is a HDD disk - 2TB

  • for now storage contain only 100GB of data

      namespace tv20 {                          
      replication-factor 2              
      memory-size 26G                   
      default-ttl 14d
    
      device /dev/sdb   
      write-block-size 128K
    

Our problem started with eviction of date from namespace - i have found this topic Why Aerospike evicted data? and make adjustment of aerospike config

    namespace tv20 {                          
    high-water-memory-pct 90 # added         
    high-water-disk-pct 90   # added
    stop-writes-pct 90       # added
    replication-factor 2              
    memory-size 26G                   
    default-ttl 14d

    device /dev/sdb   
    write-block-size 128K

And just make service aerospike restart from command line. For now restating have taken 4 days and have estimate about for 2 days.

Jul 31 2017 05:55:21 GMT: INFO (sindex): (secondary_index.c:4409)  Sindex-ticker: ns=tv20 si=<all> obj-scanned=147000000 si-mem-used=606951 progress= 60% est-time=167798425 ms

So restarting aerospike with 100GB of data will be ended in 6 days. Its kinda very bad situation for production. How can I resolve with problem? How to decrease time for restarting to 5-10 minutes? Maybe I’m doing something wrong?


aerospike-server-community           3.14.0.2-1

DISTRIB_ID=Ubuntu                         
DISTRIB_RELEASE=14.04                     
DISTRIB_CODENAME=trusty                   
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"

#2

Well, you could start empty and allow migrations. Chances are your disk is slow though.


#3

But is it necessary to clear the data on storage?

I read that Aeropikre recomend make dd if=/dev/zero of=/dev/%strorage%

And what i should do in case when both servers in cluster goes restart? We have only 2 servers in cluster If we lose data that’s will be a disaster

And yes we use HDD instead SSD


#4

We recommend setting write-block-size to be 1M with HDDs.

You appear to be taking a long time with building secondary indexes. You can take a look at How to speed up Secondary index creation or re-building? to see how to speed this process up, but I don’t believe this would help if you are IO limited from your storage device.

You can tell if you are IO limited by running iostat -x 1 and see if %util for sdb is close to 100%. Or in top and see significant %wa status.


#5

It’s not necessary but there are implications of not clearing data. Another option is of course to cold-start-empty. If IO is the limiting factor, you will still be limited by IO during migration speed. I’d definitely recommend following what rguo has said, especially using iostat to identify if that is your bottleneck.