Cluster node date delayed from the other 3 nodes

#1

Hello.

-sh-4.2$ asinfo -v build
3.16.0.1

We have a 4 nodes production cluster and only one of them is delayed:

-sh-4.2$ hostname && date ; ssh sps-sm1-db-0 date ; ssh sps-sm1-db-1 date ; ssh sps-sm1-db-3 date
sps-sm1-db-2
Thu Mar  7 10:42:44 +03 2019
Thu Mar  7 10:42:44 +03 2019
Thu Mar  7 10:42:45 +03 2019
Thu Mar  7 11:14:12 +03 2019

How can I align sps-sm1-db-3 with the others without generating data inconsistencies?

Thanks.

#2

I think the following may work:

  1. backup that node
  2. stop the aerospike service on the bad node (32 minutes ahead)
  3. stop the ntp service
  4. set the date
  5. shutdown the node
  6. power up the node
  7. adjust time again if data is still incorrect
  8. empty the aerospike data storage(s) and/or digestlog
  9. startup aerospike again

The remaining nodes should handle the normal traffic although there could be performance impact due to migration so monitor the cluster carefully.

If you don’t empty the storage, you may have the risk of resurrecting deleted data or losing updates due to generation wrap. See this for details:

Adding an empty node will cause more traffic on the fabrics so again please monitor system carefully.

Hope that make sense and helpful.

#3

Hello, Tony.

That worked just fine, many thanks! The data migration took only few minutes for ~900.000 subscribers in the database, this node being a VM. The traffic on the Aerospike cluster has not been disturbed at all.

After stopping the ntp service, I expected to be able to set the clock with timedatectl which returned an error. The right command is ntpd -qg that looked on the ntp server and automatically synchronized the time on this remote client.

Thanks again.