
I have a strange problem writing data to the aerospike cluster

aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')                                                          
OK, 1 record affected.

aql> select * from storebig.Chunks where PK = '5cb138284d431abd6a053a56625ec088bfb88912'

aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')

Same story with the golang client library (of course)

It is very possible cluster is not healfy - some strange messages appears in the server(s) log:

May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384

My question is: what can I do to investigate the situation, debug and recover? Where to look and what to try?

Thank you.

With best regards, Daniel Podolsky


config template (actual config generated from this template on docker container start)

service {
  user root
  group root
  paxos-single-replica-limit 1
  pidfile /var/run/aerospike/
  service-threads 4
  transaction-queues 4
  transaction-threads-per-queue 4
  proto-fd-max 15000

logging {
  file /storage/logs/aerospike.log {
    context any info
  console {
    context any info
network {
  service {
    address <%=os.getenv("NODE_EXT_ADDR")%>
    port 3000
  fabric {
    address <%=os.getenv("NODE_INT_ADDR")%>
    port 3001
  heartbeat {
    mode multicast
    port 9918
    interface-address <%=os.getenv("NODE_INT_ADDR")%> interval 150
    timeout 10
  info {
    address <%=os.getenv("NODE_INT_ADDR")%>
    port 3003
namespace storebig {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_BIG")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/big.dat
    filesize 3T
    data-in-memory false
namespace storefast {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_FAST")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/fast.dat
    filesize <%=os.getenv("MEM_USE_FAST")%>K
    data-in-memory true
namespace storetest {
  replication-factor 3
  memory-size <%=os.getenv("MEM_USE_FAST")%>K
  default-ttl 0
  high-water-disk-pct   98
  high-water-memory-pct 98
  stop-writes-pct       95
  storage-engine device {
    file /storage/data/test.dat
    filesize 3T
    data-in-memory false

After reading over your configuration I believe I have found your problem. Individual devices and files in Aerospike can be no larger than 2TiB and yours are configured to 3TiB. Regrettably there currently isn’t a check in config parser for this limit and I am unable to find reference in our docs–both of these issues are being taken care of.

You can instead use multiple files to store your data for each namespace (each file limited to 2TB). As discussed elsewhere you will likely see better performance by using multiple files or devices for a given namespace.

This would be helpful indeed to have such check :smile:

Тhank you. I will try multiple files today.


am I the first person in the world who loaded more than 2T data per node to the Aerospike cluster? I’m proud!

1 Like

Not really :slight_smile: There are production clusters with 8TB per node. Just that they are multiple 1 or 2 TB disks. So yes, you are the first one to try with a single file / disk bigger than 2 TB :slight_smile: