One of the issues we encountered was that when memory caching (data-in-memory) is enabled, the “asd” daemon crashes after about 30-60 seconds, even if incurring low load (a few thousand inserted records, with no more than 10 bins each.)
The Amazon machine (c3.xlarge) running the Aerospike has 7.3GB memory, out of which only about 2-3% was being used by asd at the time of the crash. The Aerospike deployment has just one node.
We have temporarily disabled the data-in-memory features (as you can see in the config below). While for the functional tests performance is not critical, we would need the data-in-memory for the load tests we have planned in the up-coming week.
Any help on fixing this issues would be greatly appreciated.
Technical details below.
Configuration:
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
address 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
#namespace test {
# replication-factor 2
# memory-size 1G
# default-ttl 30d # 30 days, use 0 to never expire/evict.
#
# storage-engine memory
#}
namespace XXXXXXXXX {
replication-factor 1
memory-size 1G
default-ttl 30d # 30 days, use 0 to never expire/evict.
#storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
storage-engine device {
file /opt/aerospike/data/bar.dat
filesize 16G
#data-in-memory true # Store data in memory in addition to file.
}
The section of the Aerospike log describing the crash:
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::161) SIGSEGV received, aborting Aerospike Community Edition build 3.7.4 os el6
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: found 7 frames
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_segv+0x32) [0x48d828]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 1: /lib64/libc.so.6(+0x35670) [0x7fe70a020670]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 2: /usr/bin/asd(cf_queue_push+0xc) [0x54b0e1]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 3: /usr/bin/asd(ssd_post_write+0x3d2) [0x518443]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 4: /usr/bin/asd(ssd_write_worker+0x14c) [0x5187c5]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 5: /lib64/libpthread.so.0(+0x7dc5) [0x7fe70b1f3dc5]
Mar 08 2016 10:30:16 GMT: WARNING (as): (signal.c::163) stacktrace: frame 6: /lib64/libc.so.6(clone+0x6d)
[0x7fe70a0e1bdd]