I am trying to upgrade to 4.4.x.x from 4.3.x.x
And I tested on test server(centos 7).
But some data changed to old data in short term.
I upgraded with below process.
stop aerospike server :
sudo systemctl stop aerospike
download 4.4.0.6
install 4.4.0.6 :
sudo ./asinstall
start aerospike server :
sudo systemctl start aerospike
Any idea please ?
my aerospike.conf is below
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
proto-fd-max 15000
}
logging {
console {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace set {
replication-factor 2
memory-size 2G
default-ttl 0 # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/set.dat
filesize 25G
data-in-memory false # Store data in memory in addition to file.
write-block-size 128K
}
}
namespace play {
replication-factor 2
memory-size 2G
default-ttl 0 # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/play.dat
filesize 25G
data-in-memory false # Store data in memory in addition to file.
write-block-size 4M
}
}
The other thing I can think of is a generation wrap around while a node was down to be upgraded, causing the older (but with higher generation) record to take over when the node comes back (refer to conflict-resolution-policy).
There may be other edge situations but would be a bit less common. Enterprise Licensee can provide logs to Aerospike Support for in depth analysis.
Backup and restore is definitely one common way. Probably the most straight forward, but would of course assume a pause in the write traffic to be as consistent as possible.
Well, we are going by the assumption that it is the cold restart that is resurrecting deleted records. But it could be a number of things. The alternate suggestions from my side all involve the Enterprise Edition (to avoid cold restarts, to potentially use XDR to directly migrate to a different cluster, and maybe consider strong consistency / durable delete to fully close the door on any inconsistencies).
Thanks
Please explain me what is exact meaning of
āmaybe consider strong consistency / durable delete to fully close the door on any inconsistenciesā ?
Sure. So, when you referred to āsome data changed to old dataā it means that you ended up with non consistent data. Now the source of the inconsistency can vary and we just made a guess that it could have been caused by the cold restart. You can of course decide to empty the storage on a node before restarting it, waiting for migrations to complete before moving on to the next node, if that is the cause of the inconsistencyā¦
There are other situations that could cause inconsistencies (split brains) when Aerospike operates in Available Mode. Aerospike Enterprise Edition can be configured to run in Strong Consistency Mode.
Running in strong consistency mode defaults to using durable deletes which would create tombstones and prevent resurrection of deleted records upon cold restart.
Hope this helpsā¦ but for your case, if the cause of the inconsistent data is the cold restart in the Community Edition, you could consider deleted the storage upon restart and wait for migrations to re-fill prior to moving to the next node.
I am using community edition, so I cant use āStrong Consistency Modeā.
Trying to resolve this problem, by adding ādeletedā field in the object.
I think community edition should have āStrong Consistency Modeā,
because this is an unexpected result of normal database.
I dont agree with you that normal database only works fine on a single node.
Most of databases(community edition) provide cluster mode, and they dont have this issue.