I am running a 3 node cluster in production. I am using aerospike community edition(3.15.0.1).
Current config of one node
# Aerospike database configuration file for deployments using mesh heartbeats.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port BOX1 3002
mesh-seed-address-port BOX2 3002
mesh-seed-address-port BOX3 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace XXXXXX {
replication-factor 2
memory-size 40G
default-ttl 0
storage-engine device {
device /dev/xvdb
write-block-size 1024K
data-in-memory true
}
}
One node of my cluster is going under maintenance and needs to be restarted.
What is the best approach to handle it so that my app doesn’t face any downtime?
I was planning to add one new node to the cluster and then remove the node which will be going under maintenance? If I chose to do this, do I need to restart other nodes in the cluster as well because I have mentioned the ips of all nodes in config of each node.
Or Should I stick with the node which will be going under maintenance and just restart it by keeping it part of the cluster?