AEROSPIKE_ERR_CLUSTER_CHANGE when querying or scanning a namespace


#1

Synopsis

When I query or scan a namespace, why does aql return “Error: (7) AEROSPIKE_ERR_CLUSTER_CHANGE”? I get the same error when I attempt to run the operation from C or Java client.

In Java, the exception is:

com.aerospike.client.AerospikeException: Error Code 7: Cluster key mismatch

I know that the node is alive, the node belongs to a cluster, and the namespace holds records. I have run queries and scans in the past. Why do I see this error?

Answer

Scans during cluster changes will not result in accurate data (potentially duplicate or incomplete results). Thus, Aerospike prevents the scans from proceeding if the cluster is currently rebalancing data.

aql returns the error “Error: (7) AEROSPIKE_ERR_CLUSTER_CHANGE” because the cluster is migrating partitions.

In the following example, we see that the nodes are migrating. Notice the non-zero values under the ‘Migrates’ column. The aql scan cannot complete successfully:

Monitor> info node
===NODES===
2015-10-14 15:57:11.885725
Sorting by IP, in Ascending order: 
ip:port                 Build   Cluster      Cluster   Free   Free   Migrates              Node         Principal   Replicated    Sys
                            .      Size   Visibility   Disk    Mem          .                ID                ID      Objects   Free
                            .         .            .    pct    pct          .                 .                 .            .    Mem
192.168.160.129:3000    3.6.2         2         true      0     99      (0,1)   BB9DD05EF290C00   BB9F06318290C00       84,028     75
192.168.160.132:3000    3.6.2         2         true      0     98   (6244,0)   BB9F06318290C00   BB9F06318290C00      542,292     69
Number of nodes displayed: 2
aql> select * from test
Error: (7) AEROSPIKE_ERR_CLUSTER_CHANGE

You see the same error when you query a secondary index while the nodes are migrating.

After the nodes complete migrations, run the command again or issue a scan using the clients and you should be able to get results as expected.

Workaround

You can however disable this check from the client policies if you still prefer the scans going through even if they may give inaccurate results.

Example in AQL,

aql> set FAIL_ON_CLUSTER_CHANGE false
FAIL_ON_CLUSTER_CHANGE = false

Some older versions of aql may not have that flag, you can use “HELP SET” to confirm.

Example in Java, the following policy can be switched to false. https://www.aerospike.com/apidocs/java/com/aerospike/client/policy/ScanPolicy.html#failOnClusterChange

public boolean failOnClusterChange

Example in Python, the scan policy can be set to false on ‘fail_on_cluster_change’

       s.foreach(callback,{'fail_on_cluster_change':False})

Stuck in adding nodes
#2

I removed the arguement - --no-cluster-change . Its going on. Do you think apps are not responding to their GETS at this moment?


#3

This only refers to scans during migration, not to reading single records (i.e. GET).


#4

What if migration threads are set to 0? Will the results still be inaccurate?