Outage is causing users to see old data (potential data loss)


We had two nodes come up with the same multicast address and they put our production servers into a bad state. It took us a while to track down what was causing this issue and the logs said to run this and we did.

Jul 21 2015 20:07:51 GMT: INFO (paxos): (paxos.c::2412) CLUSTER INTEGRITY FAULT. [Phase 1 of 2] To fix, issue this command across all nodes: dun:nodes=bb9fc7396171500,bb98ce7ef902500,bb955e9ef902500,bb9327296171500

Now people are seeing old data coming from the database. What could be causing this and how can we put it back to where it was before the outage?




Can you describe how you recovered from this outage?