Running: aerospike-server-community-3.12.0-1.el6.x86_64
I have a three node cluster and one node had a drive failure (SSD) - total data loss, but not a big deal since it was just a cache. Now I need to throw another drive in there and get that asd instance back in the cluster. Please keep in mind that I am not an experienced aerospike user/admin - know just enough to get what I need working. My config is below - I am using multicast for heartbeat - based on what I have read, all I need to do is prep the new SSD, replace the failed drive, fire asd back up and presto - everything should be cool.
It can’t be that easy. What am I missing? Any advice on checks to run after getting things back up and running?
# Aerospike database configuration file.
service {
user aerospike
group aerospike
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 8
transaction-queues 8
transaction-threads-per-queue 8
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace tmcache {
replication-factor 1
memory-size 24G # around 250 million records
default-ttl 30d
high-water-disk-pct 80
high-water-memory-pct 85
storage-engine device {
device /dev/sdc
scheduler-mode noop
## WARNING: you can raise, but cannot lower without zeroing disk
write-block-size 256K
}
}