Restart of aerospike take about 6 days

DeamonMV · July 31, 2017, 6:39am

Hi all. We faced trouble with restarting of aerospike. how does it look:

we have namespace which are use storage-engine device
as s device is a HDD disk - 2TB

for now storage contain only 100GB of data

  namespace tv20 {                          
  replication-factor 2              
  memory-size 26G                   
  default-ttl 14d

  device /dev/sdb   
  write-block-size 128K

Our problem started with eviction of date from namespace - i have found this topic Why Aerospike evicted data? and make adjustment of aerospike config

    namespace tv20 {                          
    high-water-memory-pct 90 # added         
    high-water-disk-pct 90   # added
    stop-writes-pct 90       # added
    replication-factor 2              
    memory-size 26G                   
    default-ttl 14d

    device /dev/sdb   
    write-block-size 128K

And just make service aerospike restart from command line. For now restating have taken 4 days and have estimate about for 2 days.

Jul 31 2017 05:55:21 GMT: INFO (sindex): (secondary_index.c:4409)  Sindex-ticker: ns=tv20 si=<all> obj-scanned=147000000 si-mem-used=606951 progress= 60% est-time=167798425 ms

So restarting aerospike with 100GB of data will be ended in 6 days. Its kinda very bad situation for production. How can I resolve with problem? How to decrease time for restarting to 5-10 minutes? Maybe I’m doing something wrong?

aerospike-server-community           3.14.0.2-1

DISTRIB_ID=Ubuntu                         
DISTRIB_RELEASE=14.04                     
DISTRIB_CODENAME=trusty                   
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"

Albot · July 31, 2017, 11:04pm

Well, you could start empty and allow migrations. Chances are your disk is slow though.

DeamonMV · August 1, 2017, 6:22am

But is it necessary to clear the data on storage?

I read that Aeropikre recomend make dd if=/dev/zero of=/dev/%strorage%

And what i should do in case when both servers in cluster goes restart? We have only 2 servers in cluster If we lose data that’s will be a disaster

And yes we use HDD instead SSD

rguo · August 1, 2017, 10:14pm

We recommend setting write-block-size to be 1M with HDDs.

You appear to be taking a long time with building secondary indexes. You can take a look at How to check and speed up Secondary index creation or re-building? to see how to speed this process up, but I don’t believe this would help if you are IO limited from your storage device.

You can tell if you are IO limited by running iostat -x 1 and see if %util for sdb is close to 100%. Or in top and see significant %wa status.

Albot · August 1, 2017, 11:18pm

It’s not necessary but there are implications of not clearing data. Another option is of course to cold-start-empty. If IO is the limiting factor, you will still be limited by IO during migration speed. I’d definitely recommend following what rguo has said, especially using iostat to identify if that is your bottleneck.

Topic		Replies	Views
Restarting Aerospike server deletes all data	7	3517	February 5, 2016
Why Aerospike evicted data? Configuration	2	5636	June 12, 2017
Will Data Recover on the Other Cluster or on the Local HDD? How Aerospike Works	6	2353	August 3, 2015
How to change storage type in Aerospike? Configuration	5	1880	August 4, 2022
Restore cluster trouble	2	1364	May 27, 2016

Restart of aerospike take about 6 days

Related topics