Asrestore problem: put failed in restore: unusual error 18 trying again


#1

Aerospike of 3.5.14 , 3.5.15 and the problem with asrestore , after restoring the data 12g and should be ~ 250G

Centos7 HP GEN9 with H240 Smart Host Bus Adapter with single 512GB SSD Samsung 850 EVO as RAID0

   put failed in restore: unusual error 18 trying again
    restore: too many consecutive put failure
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    restore: too many consecutive put failure
    Aug 11 2015 14:41:13 GMT: expired 0 : skipped 0 : attempted 512419 : [updated 333467 not-updated (existed 0 gen-old 178952)]
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    put failed in restore: unusual error 18 trying again
    Aug 11 2015 14:41:14 GMT: starting restore: filename: /var/backup/asbackup//BB91864BB3539E8_00214.asb FILE 0x7fcb080008c0
    Aug 11 2015 14:41:14 GMT: starting restore: filename: /var/backup/asbackup//BB91864BB3539E8_00213.asb FILE 0x7fca5c0008c0
    Aug 11 2015 14:41:14 GMT: expired 0 : skipped 0 : attempted 514666 : [updated 335567 not-updated (existed 0 gen-old 179099)]
    Aug 11 2015 14:41:15 GMT: expired 0 : skipped 0 : attempted 517187 : [updated 337940 not-updated (existed 0 gen-old 179247)]
    Aug 11 2015 14:41:16 GMT: expired 0 : skipped 0 : attempted 519931 : [updated 340444 not-updated (existed 0 gen-old 179487)]
    Aug 11 2015 14:41:17 GMT: expired 0 : skipped 0 : attempted 522591 : [updated 342894 not-updated (existed 0 gen-old 179697)]
    Aug 11 2015 14:41:18 GMT: expired 0 : skipped 0 : attempted 525190 : [updated 345250 not-updated (existed 0 gen-old 179940)]

config tunned with afterburner script

namespace example {
        replication-factor 2
        memory-size 24G
        default-ttl 120d
        high-water-disk-pct 50
        high-water-memory-pct 50
        stop-writes-pct 90
        storage-engine memory
        storage-engine device {
                device /dev/sdc
                scheduler-mode noop
                write-block-size 128K
                data-in-memory false
        }
}

Launched in one thread works without a problem, recovery takes too long , by default 20 threads !

Someone managed how to better configure aerospike with HP Smart Array ?


#2

Centos7 HP GEN9 with H240 Smart Host Bus Adapter with single 512GB SSD Samsung 850 PRO as RAID0

Could you provide an iostat output to check disk io during the restore.

iostat -x 1 10

And also elaborate on “Launched in one thread works without a problem, recovery takes too long , by default 20 threads !”

Could you provide your service stanza configuration from aerospike.conf ( service-threads, transaction-queues , transaction-threads-per-queue) to confirm threads used.

A few recommendations:

  1. Make sure disk cache policy is set for each drive with Write Through (WT) and NO Read Ahead (NORA).

  2. Ensure that your Samsung 850 Pro is over-provisioned prior to using it with Aerospike.

https://discuss.aerospike.com/t/what-is-the-best-way-to-over-provision-ssd-using-partitions/511

https://www.aerospike.com/docs/operations/plan/ssd/ssd_setup.html


#3
service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 32
        transaction-queues 32
        transaction-threads-per-queue 3
        proto-fd-max 15000
}

By default asrestore use 20 threads, limited to 1 thread works without a problem.

sda - system sdb - filesystem with backup to restore sdc - aerospike dediacted drive

asrestore -d . -r -v -t 20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00     0,00 5342,00    0,00 422516,00     0,00   158,19    22,74    4,25    4,25    0,00   0,18  98,20
sdc               0,00     0,00   97,00   91,00 11714,50 11648,00   248,54     1,90   10,20    9,41   11,03   5,32 100,00

asrestore -d . -r -v -t 1

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00    0,00    0,00   0,00   0,00
sdc               0,00     0,00   47,00   54,00  5774,00  6912,00   251,21     0,75    7,46    3,79   10,65   5,91  59,70

Thanks for your reply !

Today i will try with over-provisioned drive and different hp smart array settings


#4

No problem Let us know how it goes with over-provisioning.

sdc is completely peg during the restore with await of 10.20ms and 100% %util

Seems like your drive can’t keep up with the speed of the database.


#5

I have changed drive form Samsung 850 EVO to Samsung 845 DC EVO and problem solved

Benchmarks

Samsung 850 EVO trans device %>(ms) %>(ms) slice 1 2 4 8 16 32 64 1 2 4 8 16 32 64 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 1 42.11 39.70 34.87 25.05 7.31 0.00 0.00 41.39 39.14 34.56 24.93 7.22 0.00 0.00 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ avg 42.11 39.70 34.87 25.05 7.31 0.00 0.00 41.39 39.14 34.56 24.93 7.22 0.00 0.00 max 42.11 39.70 34.87 25.05 7.31 0.00 0.00 41.39 39.14 34.56 24.93 7.22 0.00 0.00

Samsung 845 DC EVO
trans device %>(ms) %>(ms) slice 1 2 4 8 16 32 64 1 2 4 8 16 32 64 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ 1 1.28 0.17 0.00 0.00 0.00 0.00 0.00 1.27 0.16 0.00 0.00 0.00 0.00 0.00 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ avg 1.28 0.17 0.00 0.00 0.00 0.00 0.00 1.27 0.16 0.00 0.00 0.00 0.00 0.00 max 1.28 0.17 0.00 0.00 0.00 0.00 0.00 1.27 0.16 0.00 0.00 0.00 0.00 0.00