Hi Aerospike Team,
We have been using Aerospike for quiet some time and it was working fine till some time back. for past few weeks, we are noticing a strange behavior. the aerospike service gets started. but it gets stopped after some time. Checked the logs. They look clean. Can somebody help? /etc/aerospike/aerospike.conf
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
#file /var/log/aerospike/aerospike_debug.log {
#context any debug
#}
file /var/log/aerospike/aerospike_critical.log {
context any critical
}
#file /var/log/aerospike/aerospike_warning.log {
#context any warning
#}
#file /var/log/aerospike/aerospike_detail.log {
#context any detail
#}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
#address 239.1.99.222
port 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
mesh-seed-address-port 172.31.7.207 3002
interval 300
timeout 10
}
fabric {
address any
port 3001
}
info {
port 3003
}
}
# In the Test setup the instance type assummed is t2.medium
namespace cdbapplication {
replication-factor 1
memory-size 20G
default-ttl 0 # 30 days, use 0 to never expire/evict.
storage-engine device {
#device /dev/xvdi
file /asdb/aerospike/data/cdbapplication.dat
filesize 20G
data-in-memory true # Store data in memory in addition to file.
}
}
namespace cdbcontent {
replication-factor 1
memory-size 8G
default-ttl 0
storage-engine device {
#device /dev/xvdj
file /asdb/aerospike/data/cdbcontent.dat
filesize 40G
data-in-memory false
}
}
Critical Logs:
Warning Logs:
Oct 28 2020 18:09:59 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:10:30 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:11:02 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:11:33 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:12:05 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:12:36 GMT: WARNING (cf:socket): (socket.c::371) Error in delayed connect() to 172.31.14.210:3002: timed out
Oct 28 2020 18:13:02 GMT: WARNING (as): (signal.c::171) SIGTERM received, shutting down
Oct 28 2020 18:14:49 GMT: WARNING (hb): (hb.c::1331) rejected tip for self: 172.31.7.207:3002
Oct 28 2020 18:14:49 GMT: WARNING (hb): (hb.c::2127) couldn't add remote heartbeat service 172.31.7.207:3002 to mesh host list
Oct 28 2020 18:21:25 GMT: WARNING (as): (signal.c::171) SIGTERM received, shutting down
Info Logs:
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::137) histogram dump: udf (523 total) msec
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (10: 0000000005) (11: 0000000009) (17: 0000000048) (18: 0000000075)
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (19: 0000000137) (20: 0000000174) (21: 0000000071) (22: 0000000004)
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::137) histogram dump: query (150 total) msec
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (00: 0000000109) (01: 0000000010) (02: 0000000021) (03: 0000000003)
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (04: 0000000003) (06: 0000000001) (07: 0000000002) (17: 0000000001)
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (149 total) count
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (01: 0000000087) (03: 0000000001) (04: 0000000009) (05: 0000000015)
Oct 29 2020 15:54:43 GMT: INFO (info): (hist.c::154) (06: 0000000015) (07: 0000000017) (11: 0000000003) (13: 0000000002)
Oct 29 2020 15:54:46 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /asdb/aerospike/data/cdbcontent.dat: used 727486848, contig-free 40261M (40261 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 9 (0.0/s) defrag-w-tot 0 (0.0/s)
^[[AOct 29 2020 15:54:47 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /asdb/aerospike/data/cdbapplication.dat: used 122577536, contig-free 20359M (20359 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 8 (0.0/s) defrag-w-tot 0 (0.0/s)
Oct 29 2020 15:54:51 GMT: INFO (nsup): (thr_nsup.c::1206) {cdbapplication} nsup start
Oct 29 2020 15:54:51 GMT: INFO (nsup): (thr_nsup.c::1130) {cdbapplication} Records: 247394, 247394 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 52 ms
Oct 29 2020 15:54:51 GMT: INFO (nsup): (thr_nsup.c::1206) {cdbcontent} nsup start
Oct 29 2020 15:54:51 GMT: INFO (nsup): (thr_nsup.c::1130) {cdbcontent} Records: 619669, 619669 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 138 ms
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::4979) system memory: free 14987136kb ( 95 percent free )
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::4985) ClusterSize 1 ::: objects 867063 ::: sub_objects 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::4994) rec refs 867063 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::4999) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5009) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (9, 10043, 10034) : hb (0, 0, 0) : fab (16, 16, 0)
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5011) heartbeat_received: self 0 : foreign 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5012) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5024) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5041) {cdbapplication} disk bytes used 122577536 : avail pct 99
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5043) {cdbapplication} memory bytes used 82020200 (index 15833216 : sindex 6380887 : data 59806097) : used pct 0.38
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5091) {cdbapplication} migrations - complete
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5053) {cdbcontent} disk bytes used 727486848 : avail pct 98 : cache-read pct 0.00
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5055) {cdbcontent} memory bytes used 76232989 (index 39658816 : sindex 36574173) : used pct 0.89
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5091) {cdbcontent} migrations - complete
Oct 29 2020 15:54:53 GMT: INFO (info): (thr_info.c::5098) partitions: actual 8192 sync 0 desync 0 zombie 0 absent 0
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: reads (52440 total) msec
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (00: 0000021699) (01: 0000009918) (02: 0000013232) (03: 0000006376)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (06: 0000001089) (07: 0000000012) (17: 0000000001) (18: 0000000002)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::163) (19: 0000000109) (20: 0000000001) (21: 0000000001)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (34 total) msec
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::163) (00: 0000000034)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: udf (523 total) msec
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (10: 0000000005) (11: 0000000009) (17: 0000000048) (18: 0000000075)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (19: 0000000137) (20: 0000000174) (21: 0000000071) (22: 0000000004)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: query (150 total) msec
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (00: 0000000109) (01: 0000000010) (02: 0000000021) (03: 0000000003)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (04: 0000000003) (06: 0000000001) (07: 0000000002) (17: 0000000001)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (149 total) count
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (01: 0000000087) (03: 0000000001) (04: 0000000009) (05: 0000000015)
Oct 29 2020 15:54:53 GMT: INFO (info): (hist.c::154) (06: 0000000015) (07: 0000000017) (11: 0000000003) (13: 0000000002)