Urgent: Migration stuck v3.8.1, missing acks from node

Well, doesn’t seem very interesting after all. There is a little thrash from your clients but doesn’t seem to terrible (do you guys periodically restart groups of clients?). But hb and fabric sockets are fairly stable and show signs of a node being ejected and re-added. I don’t see any strong evidence here of a network issue.

In later releases there were improvements to fabric and hb layers (also clustering in 3.3.13). The fabric on your server is showing a concerning message that I haven’t seen logged before:

Jun 22 2017 23:14:18 GMT: INFO (fabric): (fabric.c:1288) can’t write to notification file descriptor: will probably have to take down process

I believe this was a code path that wasn’t expected to be reachable. The newer servers definitely will not have this issue since this particular socket was eliminated.