Upgrade from 4.3.0.6 to 4.4.0.6, data rollback


#1

I am trying to upgrade to 4.4.x.x from 4.3.x.x And I tested on test server(centos 7).

But some data changed to old data in short term.

I upgraded with below process.

  1. stop aerospike server : sudo systemctl stop aerospike
  2. download 4.4.0.6
  3. install 4.4.0.6 : sudo ./asinstall
  4. start aerospike server : sudo systemctl start aerospike

Any idea please ?

my aerospike.conf is below

service {
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        proto-fd-max 15000
}

logging {
        console {
                context any info
        }
}

network {
        service {
                address any
                port 3000
        }

        heartbeat {
                mode multicast
                multicast-group 239.1.99.222
                port 9918

                # To use unicast-mesh heartbeats, remove the 3 lines above, and see
                # aerospike_mesh.conf for alternative.

                interval 150
                timeout 10
        }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace set {
        replication-factor 2
        memory-size 2G
        default-ttl 0 # 30 days, use 0 to never expire/evict.

        storage-engine device {
                file /opt/aerospike/data/set.dat
                filesize 25G
                data-in-memory false # Store data in memory in addition to file.
                write-block-size 128K
        }
}

namespace play {
        replication-factor 2
        memory-size 2G
        default-ttl 0 # 30 days, use 0 to never expire/evict.

        storage-engine device {
                file /opt/aerospike/data/play.dat
                filesize 25G
                data-in-memory false # Store data in memory in addition to file.
                write-block-size 4M
        }
}

#2

That should work I would think. What problem are you having? Are you trying to get old data back by installing an older version of the daemon?


#3

Using the Community Edition, any restart will be a cold restart one and potentially resurrect deleted data.

The other thing I can think of is a generation wrap around while a node was down to be upgraded, causing the older (but with higher generation) record to take over when the node comes back (refer to conflict-resolution-policy).

There may be other edge situations but would be a bit less common. Enterprise Licensee can provide logs to Aerospike Support for in depth analysis.


#4

Thansk for reply.

How can I upgrade aeropike with data safe ? What is best way ?

Backup and restore ?


#5

Backup and restore is definitely one common way. Probably the most straight forward, but would of course assume a pause in the write traffic to be as consistent as possible.


#6

Thanks Meher for quick reply. Does any other solution is possible? except ‘backup ad restore’.


#7

Well, we are going by the assumption that it is the cold restart that is resurrecting deleted records. But it could be a number of things. The alternate suggestions from my side all involve the Enterprise Edition (to avoid cold restarts, to potentially use XDR to directly migrate to a different cluster, and maybe consider strong consistency / durable delete to fully close the door on any inconsistencies).


#8

Thanks Please explain me what is exact meaning of “maybe consider strong consistency / durable delete to fully close the door on any inconsistencies” ?


#9

Sure. So, when you referred to ‘some data changed to old data’ it means that you ended up with non consistent data. Now the source of the inconsistency can vary and we just made a guess that it could have been caused by the cold restart. You can of course decide to empty the storage on a node before restarting it, waiting for migrations to complete before moving on to the next node, if that is the cause of the inconsistency…

There are other situations that could cause inconsistencies (split brains) when Aerospike operates in Available Mode. Aerospike Enterprise Edition can be configured to run in Strong Consistency Mode.

Running in strong consistency mode defaults to using durable deletes which would create tombstones and prevent resurrection of deleted records upon cold restart.

Hope this helps… but for your case, if the cause of the inconsistent data is the cold restart in the Community Edition, you could consider deleted the storage upon restart and wait for migrations to re-fill prior to moving to the next node.


#10

Thanks Meher.

I am using community edition, so I cant use “Strong Consistency Mode”. Trying to resolve this problem, by adding “deleted” field in the object.

I think community edition should have “Strong Consistency Mode”, because this is an unexpected result of normal database.


#11

If you can support your data needs on a single node (like a normal database) then you wouldn’t need to trade off either consistency or availability.


#12

Hi Kporetr.

I dont agree with you that normal database only works fine on a single node. Most of databases(community edition) provide cluster mode, and they dont have this issue.