We are running on Amazon EC2, 4.1.10-17.31.amzn1.x86_64 #1 SMP Sat Oct 24 01:31:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux, all system updates installed.
Which version were you using prior to 3.7.0.2?
Could you share more about your use case and the features you are using?
Can you supply your Aerospike configuration so we could attempt to reproduce. (You might want to mask the IP’s and other sensitive information).
Brief look at the stack trace, it seems like you are using Secondary Indexes, can we get more information specific to your secondary index use case? Also has this issue occurred again since the initial crash? If it has occurred again, we would like the latest stacktrace as well.
I am Dean’s colleague. We were using 3.6.4 CE on AWS, which worked fine. Then, we upgraded to 3.7.0.1 and server worked for some time (a day), then 2 nodes started to shutdown consistently. We upgraded to 3.7.0.2 and problem has reoccurred. After that we downgraded back to 3.6.4 and server works now.
We are using AS as main database for out web application. Currently, AS is not under huge load as it is used in DEV environment. We are using 25 sets and 43 secondary indexes.
Info about indexes:
indextype: NONE, LIST, MAPKEYS
num_bins: 1
state: RW
sync_state: synced
type: STRING, NUMERIC
Configuration:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port someIp 3002
mesh-seed-address-port someIp 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace someNamespace {
replication-factor 3
memory-size 10G
default-ttl 0 # use 0 to never expire/evict.
storage-engine device {
device /dev/sdb
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K
}
}
This is likely due to not handling empty lists for sindex.
If there was an empty list at any time used for sindex this would cause a divide by zero crash.
It’ll be fixed shortly.
If possible, please confirm the existence/use of empty lists when the above happened. Thank you.