Cluster nodes connections unbalanced


#1

I have 2 clusters of Aerospike:

  1. Contains 5 servers
  2. Contains 2 servers

Both clusters are built on physical HW, not VMs. Based on ‘info’ stats looks like some clients are loaded significant more, than others.

~~~~~~~~~Network Information~~~~~~~~~
     Build   Cluster        Cluster     Cluster         Principal   Client       Uptime
         .      Size            Key   Integrity                 .    Conns            .
C-3.14.1.2         5   85C25B244A5D   True        BB9F34D69771814     **4647**   3383:27:55
C-3.14.1.2         5   85C25B244A5D   True        BB9F34D69771814     1704   2879:34:25
C-3.14.1.2         5   85C25B244A5D   True        BB9F34D69771814     1404   4280:55:22
C-3.14.1.2         5   85C25B244A5D   True        BB9F34D69771814     **4378**   3118:33:43
C-3.14.1.2         5   85C25B244A5D   True        BB9F34D69771814     2172   99:35:29

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Information~~~~~~~~~
     Build   Cluster        Cluster     Cluster         Principal   Client     Uptime
         .      Size            Key   Integrity                 .    Conns          .
C-3.15.0.1         2   29371661794F   True        BB9306F99DA6618     **1519**   76:10:37
C-3.15.0.1         2   29371661794F   True        BB9306F99DA6618     **3376**   77:11:47

What could be the reason for it? Also, based on clients issue stats, more loaded instances end up with read errors more often:

Errors         Instance
 206286     -> **4647**   3383:27:55
   7297       ->  **4378**   3118:33:43
    146        -> 2172   99:35:29
 231793     -> **3376**   77:11:47

#2

It could be due to a hotkey or the nodes with more connections are more sluggish than the others. Check histograms to see if ops/s are higher on some nodes, if any transaction queue buildup, and if any latency is present… Assume you have checked the resource availability on the systems and for any non-INFO messages ?