Hello, i just restarted 1 node in cluster(3 nodes, replication factor 2) and see this in logs of restarted node:
May 22 2015 02:55:32 GMT: WARNING (tsvc): (thr_tsvc.c::424) rejecting client transaction - initial partition balance unresolved
and this on other nodes:
May 22 2015 02:57:02 GMT: INFO (paxos): (paxos.c::2367) Cluster Integrity Check: Detected succession list discrepancy between node bb920103bcb2b78 and self bb9f0afb752aed4
May 22 2015 02:57:02 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes: dun:nodes=bb920103bcb2b78
what is it about? how can i deal with it?
after stop waiting and start in gone, is it related with restart? AS version 3.5.8
There is a very small window where a node joins a cluster and other nodes begin to advertise the node but the node hasn’t finished creating its partition table and a client picked up that advertised service and made a request.
I wouldn’t expect this window to have lasted very long at all. Actually this is the first time I am seeing this message actually being logged–so congratulations. How long did this message last for?
i see this warning non-stop, until i don’t stop node.
if i do restart it appear again.
Could you share information about the environment you are running?
What OS? Kernel?
Is this running in a virtualized environment?
Have you been able to reproduce the issue? If so can you provide your method?
Are the Aerospike nodes on different machines?
Also could you share your /etc/aerospike/aerospike.conf
?
It’s not Virtual machines, cluster from 3 metal servers, my configs here: How to increase threads used by UDFs? - #10 by raj
i was able to reproduce it just by /etc/init.d/aerospike restart i will check in monday is it still reproducable or not.
The configuration there has the replication-factor
configured to 0! And the namespace is not persisted.
If this is still the case then I would expect data loss when a node is dropped. Also replication-factor 0
should be an illegal setting–I am not sure what behavior you will see with that.
The minimum replication-factor
should be 1 which is to say that there is only a single copy of the data in the cluster. This means that if a single node drops a portion of your data will not be in the cluster.
If you want 2 copies in the cluster then replication-factor
needs to be configured to 2.
sorry, i sent you wrong config. This config for single server installation. I will send right a bit later.
@nizsheanez,
A JIRA ticket has been filed to make replication-factor 0
an illegal setting. It’s AER-3863, just for reference.
We look forward to seeing your config!
Cheers,
Maud