AEROSPIKE_ERR_RECORD_NOT_FOUND just after INSERT

onokonem · May 6, 2015, 1:22pm

Hi,

I have a strange problem writing data to the aerospike cluster

aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')                                                          
OK, 1 record affected.

aql> select * from storebig.Chunks where PK = '5cb138284d431abd6a053a56625ec088bfb88912'
Error: (2) AEROSPIKE_ERR_RECORD_NOT_FOUND

aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')
Error: (1) AEROSPIKE_ERR_SERVER

Same story with the golang client library (of course)

It is very possible cluster is not healfy - some strange messages appears in the server(s) log:

May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384

My question is: what can I do to investigate the situation, debug and recover? Where to look and what to try?

Thank you.

With best regards, Daniel Podolsky

UPDATE

config template (actual config generated from this template on docker container start)

service {
  user root
  group root
  paxos-single-replica-limit 1
  pidfile /var/run/aerospike/asd.pid
  service-threads 4
  transaction-queues 4
  transaction-threads-per-queue 4
  proto-fd-max 15000
}

logging {
  file /storage/logs/aerospike.log {
    context any info
  }
  console {
    context any info
  }
}
network {
  service {
    address <%=os.getenv("NODE_EXT_ADDR")%>
    port 3000
  }
  fabric {
    address <%=os.getenv("NODE_INT_ADDR")%>
    port 3001
  }
  heartbeat {
    mode multicast
    address 239.1.99.2
    port 9918
    interface-address <%=os.getenv("NODE_INT_ADDR")%> interval 150
    timeout 10
  }
  info {
    address <%=os.getenv("NODE_INT_ADDR")%>
    port 3003
  }
}
namespace storebig {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_BIG")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/big.dat
    filesize 3T
    data-in-memory false
  }
}
namespace storefast {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_FAST")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/fast.dat
    filesize <%=os.getenv("MEM_USE_FAST")%>K
    data-in-memory true
  }
}
namespace storetest {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_FAST")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/test.dat
    filesize 3T
    data-in-memory false
  }
}

kporter · May 7, 2015, 12:05am

After reading over your configuration I believe I have found your problem. Individual devices and files in Aerospike can be no larger than 2TiB and yours are configured to 3TiB. Regrettably there currently isn’t a check in config parser for this limit and I am unable to find reference in our docs–both of these issues are being taken care of.

You can instead use multiple files to store your data for each namespace (each file limited to 2TB). As discussed elsewhere you will likely see better performance by using multiple files or devices for a given namespace.

onokonem · May 7, 2015, 4:48am

This would be helpful indeed to have such check

Тhank you. I will try multiple files today.

PS

am I the first person in the world who loaded more than 2T data per node to the Aerospike cluster? I’m proud!

anshu · May 7, 2015, 1:58pm

Not really There are production clusters with 8TB per node. Just that they are multiple 1 or 2 TB disks. So yes, you are the first one to try with a single file / disk bigger than 2 TB

Topic		Replies	Views
Configuration review for file backed namespace Configuration	1	1347	May 7, 2015
Inconsistent result if fetching a key when 1 node crashed on 4 node Aerospike cluster (3.9.0) AQL	31	3971	October 14, 2016
Aerospike_err_cluster AQL	2	2041	July 6, 2017
Aql Error: (11) AEROSPIKE_ERR_CLUSTER [Resolved] Installation	3	2669	November 3, 2015
Consistency issue with Aerospike Query & Indexing query	6	2094	January 12, 2016

AEROSPIKE_ERR_RECORD_NOT_FOUND just after INSERT

Related topics