Aerospike losing documents when node goes down

djunior · September 17, 2015, 11:27pm

I’ve been doing dome tests using aerospike and I noticed a behavior different than what is sold.

I have a cluster of 4 nodes running on AWS in the same AZ, the instances are t2micro (1cpu, 1gb RAM, 25gb SSD) using the aws linux with the AMI aerospike

aerospike.conf:

heartbeat {
        mode mesh
        port 3002                        
        mesh-seed-address-port XXX.XX.XXX.164 3002
        mesh-seed-address-port XXX.XX.XXX.167 3002
        mesh-seed-address-port XXX.XX.XXX.165 3002
        #internal aws IPs
...
namespace teste2 {
        replication-factor 2
        memory-size 650M
            default-ttl 365d                                                                                                                    
    	storage-engine device {
                    file /opt/aerospike/data/bar.dat
                    filesize 22G
                        data-in-memory false                                                                     
        }
}

What I did was a test to see if I would loose documents when a node goes down. For that I wrote a little code on python:

from __future__ import print_function
import aerospike
import pandas as pd
import numpy as np
import time
import sys
config = {
  'hosts': [ ('XX.XX.XX.XX', 3000),('XX.XX.XX.XX',3000),
             ('XX.XX.XX.XX',3000), ('XX.XX.XX.XX',3000)]
} # external aws ips
client = aerospike.client(config).connect()
for i in range(1,10000):
  key = ('teste2', 'setTest3', ''.join(('p',str(i))))
  try:
    client.put(key, {'id11': i})
    print(i)
  except Exception as e:
    print("error: {0}".format(e), file=sys.stderr)
  time.sleep(1)

I used this code just for inserting a sequence of integers that I could check after that. I ran that code and after a few seconds I stopped the aerospike service at one node for 10 seconds, using sudo service aerospike stop and sudo service aerospike colstart to restart.

I waited for a few seconds until the nodes did all the migration and executed the following python script:

query = client.query('teste2', 'setTest3')
query.select('id11')
te = []
def save_result((key, metadata, record)):
    te.append(record)
query.foreach(save_result)
d = pd.DataFrame(te)
d2 = d.sort(columns='id11')
te2 = np.array(d2.id11)
for i in range(0,len(te2)):
  if i > 0:
    if (te2[i] !=  (te2[i-1]+1) ):
      print('no %d'% int(te2[i-1]+1))
print(te2)

And got as response:

no 3
no 6
no 8
no 11
no 13
no 17
no 20
no 22
no 24
no 26
no 30
no 34
no 39
no 41
no 48
no 53
[ 1  2  5  7 10 12 16 19 21 23 25 27 28 29 33 35 36 37 38 40 43 44 45 46 47 51 52 54]

Is my cluster configured wrong or this is normal?

ps: I tried to include as many things I could, if you please suggest more information to include I will appreciate.

Mnemaudsyne · September 18, 2015, 11:59pm

We see that you posted the same issue on Stackoverflow here and since @kporter has already begun answering it there, let’s continue on that thread.

We are always happy to help, but please refrain from double-posting on the forum and on Stackoverflow … especially on the same day.

djunior · September 22, 2015, 8:58pm

Yeah, okay, my bad. Actually the question didn’t begin to be answered.

I still don’t know why this happens and I was close to switch to aerospike, but with this problem we can’t migrate until it is solved.

Thanks anyway

Topic		Replies	Views
Aerospikes strange behaviour when link between nodes goes down How Aerospike Works query , scan , index	8	3896	January 17, 2017
Aerospike behavior when node dies	5	4347	February 11, 2015
Aerospike cluster crashed after index creation Configuration aws	5	2871	September 22, 2015
Losing records after node fails Configuration	3	1494	May 24, 2015
Aerospike Cluster Automatically Errors Node.js Client	3	3733	January 18, 2016

Aerospike losing documents when node goes down

Related topics