Aerospike write error 9


#1

Hi all,

we have some issue with Aerospike in 2 node cluster. Randomly (every 5-10 minutes) we get randomly error CONNECTION TIMEOUT from PHP 7.1 . Usually is about 300 connections. All reads & writes are going to one node. Server is Debian 8, load average about 0.2 and none iowait (two SSD in RAID1).

Error from PHP library:

Aerospike write error 9 -> Timeout: timeout=3000 iterations=1 failedNodes=0 failedConns=0

This is our configuration:

service {
        paxos-single-replica-limit 1
        auto-pin cpu
        nsup-startup-evict                true
        nsup-period                       60
        proto-fd-max 100000
}

logging {
    file /var/log/aerospike/aerospike.log {
                context any info
                context migrate debug
        }
        console {
                context any info
        }
}

network {
        service {
                address bond0
                port 3000
        }
        heartbeat {
                mode mesh                                 
                address 192.168.201.35                    
                port 3002                                
                mesh-seed-address-port 192.168.201.35 3002                        
                mesh-seed-address-port 192.168.201.36 3002
                interval 150
                timeout 10
        }
        fabric {
                port 3001
        }
        info {
                port 3003
        }
}

namespace codes {
        replication-factor 2
        memory-size 32G
        default-ttl 0
        high-water-memory-pct 95
        high-water-disk-pct   95
        stop-writes-pct 95

        storage-engine device {
                device /dev/vg/aerospike
                write-block-size 128K
                post-write-queue 2048
                max-write-cache 512M
        }
}

namespace live_stats {
        memory-size 1G          
        replication-factor 2    
        high-water-memory-pct 60
        stop-writes-pct 99      
        default-ttl 0           
        storage-engine memory   
}

Thank you.


#2

Any latency showing up in the histograms?


#3
Hello, 

Apr 19 2017 19:13:23 
           % > (ms) 
slice-to (sec)      1      8     64    ops/sec 
-------------- ------ ------ ------ ---------- 
19:13:33    10   1.30   0.00   0.00       23.1 
19:13:43    10   0.88   0.00   0.00       22.8 
19:13:53    10   0.42   0.00   0.00       23.6 
19:14:03    10   0.92   0.00   0.00       21.7 
19:14:13    10   0.38   0.00   0.00       26.5 
19:14:23    10   0.41   0.41   0.00       24.1 
19:14:33    10   0.00   0.00   0.00       21.6 
19:14:43    10   0.88   0.00   0.00       22.8 
19:14:53    10   0.00   0.00   0.00       20.6 
19:15:03    10   0.98   0.00   0.00       20.4 
19:15:14    11   0.47   0.47   0.00       19.3 
19:15:24    10   0.47   0.47   0.00       21.3 
19:15:34    10   0.78   0.00   0.00       25.6 
19:15:44    10   0.87   0.00   0.00       23.0 
19:15:54    10   2.21   0.44   0.00       22.6 
19:16:04    10   0.83   0.41   0.00       24.2 
19:16:14    10   0.77   0.38   0.00       26.0 
19:16:24    10   0.50   0.00   0.00       20.1 
19:16:34    10   0.46   0.00   0.00       21.6 
19:16:44    10   0.45   0.00   0.00       22.1 
19:16:54    10   0.42   0.00   0.00       23.8 
19:17:04    10  63.56  63.56  62.29       23.6 
19:17:14    10   1.26   0.00   0.00       23.8 
19:17:24    10   0.00   0.00   0.00       21.8 
19:17:34    10   1.37   0.00   0.00       21.9 
19:17:44    10   2.49   2.49   0.00       24.1 
19:17:54    10   2.88   2.40   0.48       20.8 
19:18:04    10   2.30   0.92   0.00       21.7 
19:18:14    10   0.00   0.00   0.00       24.5 
-------------- ------ ------ ------ ---------- 
 avg         3.04   2.48   2.16       22.0 
 max        63.56  63.56  62.29       26.5

#4

That’s very interesting. Is this latency reflected on just 1 particular node - or all of them? Also- what histogram is this?