Server Version : 3.1.1.2 Java Client Version : 3.2.4
3 node cluster with single namespace with replication factor 2
Observation
-
When read/writes are happening on a cluster and 1 node goes down then read/writes failed due to connection failure but that happens for couple of seconds and then read/writes continue without any issue
-
Same behavior if 2nd node also goes down after some gap of time
-
However if two nodes go down simultaneously, then the clients 60-70% read/write keep failing forever until the java code re initiates the connection to server . This behavior is not as expected and expected behavior was as in 1 and 2.
Issue is easily reproducible. Seems like some bug in the client library