Okay, so here’s an update - I’m guessing it may be something to do with querying and possibly that the cron writing data back to aerospike is somehow either resetting the sindex or the data or some form of query counters or something?
I have run a histogram on the query for that namespace as below, and as you can see I turned off the cron, and at exactly 14:22 you can see the queries start to slow down and connections increase until it gets to a point where we start getting timeouts and then I re-enable the cron at 14:26:56, run a sync and things start to settle back down again:
Sep 14 2017 14:17:26
% > (ms)
slice-to (sec) 1 8 64 ops/sec
-------------- ------ ------ ------ ----------
14:17:36 10 29.68 0.83 0.01 1011.2
14:17:46 10 30.02 0.57 0.00 1023.4
14:17:56 10 29.11 0.54 0.00 1016.1
14:18:06 10 29.59 0.40 0.00 1027.8
14:18:16 10 30.21 0.25 0.00 999.4
14:18:26 10 27.03 0.51 0.00 988.8
14:18:36 10 30.09 0.33 0.00 990.8
14:18:46 10 29.77 0.20 0.00 1028.8
14:18:56 10 29.67 0.51 0.00 1012.3
14:19:06 10 28.69 0.73 0.00 1039.3
14:19:16 10 29.76 0.56 0.00 981.9
14:19:26 10 27.55 0.86 0.00 1011.9
14:19:36 10 27.52 1.55 0.02 1031.6
14:19:46 10 27.55 1.70 0.01 1050.6
14:19:56 10 27.66 0.73 0.00 1016.8
14:20:06 10 27.22 0.47 0.00 1024.1
14:20:16 10 27.08 0.65 0.00 1054.0
14:20:26 10 25.97 0.46 0.00 1029.0
14:20:36 10 28.20 0.57 0.00 1016.9
14:20:46 10 29.47 0.90 0.00 989.9
14:20:56 10 30.69 0.56 0.00 985.1
14:21:06 10 29.34 0.60 0.00 1016.7
14:21:16 10 30.23 0.61 0.00 989.8
14:21:26 10 29.90 0.39 0.00 990.0
14:21:36 10 29.58 0.45 0.00 1008.0
14:21:46 10 29.86 0.54 0.00 1016.1
14:21:56 10 31.49 0.79 0.00 1019.6
14:22:06 10 49.85 16.46 0.00 1032.9
14:22:16 10 57.00 18.56 0.00 1004.6
14:22:26 10 65.38 20.42 0.00 1029.7
14:22:36 10 68.66 19.89 0.00 994.3
14:22:46 10 71.37 17.52 0.00 997.3
14:22:56 10 71.43 17.58 0.00 993.5
14:23:06 10 71.74 16.54 0.00 1027.1
14:23:16 10 71.27 14.89 0.00 1031.2
14:23:26 10 71.30 25.44 0.01 1013.0
14:23:36 10 71.82 62.48 23.32 927.4
14:23:46 10 71.72 63.57 24.73 915.9
14:23:56 10 72.38 68.15 43.12 905.4
14:24:06 10 71.87 66.89 37.14 915.0
14:24:16 10 71.54 63.94 21.96 880.4
14:24:26 10 71.59 61.68 37.51 843.6
14:24:36 10 71.52 68.19 53.40 906.1
14:24:46 10 71.71 71.07 71.07 941.8
14:24:56 10 72.16 64.21 40.34 856.9
14:25:06 10 70.78 59.69 32.06 803.5
14:25:16 10 70.81 61.21 50.98 846.1
14:25:26 10 72.57 67.73 54.83 876.3
14:25:36 10 71.68 60.90 42.47 853.5
14:25:46 10 70.50 56.07 40.48 909.6
14:25:56 10 72.00 59.38 47.10 854.7
14:26:06 10 72.64 66.00 53.21 891.3
14:26:16 10 71.65 64.29 53.48 849.6
14:26:26 10 72.21 62.42 44.23 832.6
14:26:36 10 71.36 62.47 47.23 813.5
14:26:46 10 71.34 64.92 51.24 853.4
14:26:56 10 69.34 35.80 17.72 1151.1
14:27:06 10 72.89 21.81 17.61 1511.4
14:27:16 10 71.96 20.17 16.51 1332.9
14:27:26 10 71.89 17.70 11.29 1199.7
14:27:36 10 71.18 11.36 1.52 1095.9
14:27:46 10 71.15 3.00 0.00 1044.1
14:27:56 10 71.09 3.46 0.00 1051.7
14:28:06 10 70.86 2.99 0.01 1026.0
14:28:16 10 70.65 2.01 0.00 1010.0
14:28:26 10 70.67 2.17 0.00 1021.4
14:28:36 10 70.25 2.03 0.00 987.1
14:28:46 10 70.56 0.97 0.00 1011.1
14:28:56 10 70.30 0.89 0.00 993.4
14:29:06 10 69.83 0.89 0.00 1009.3
14:29:16 10 69.44 0.94 0.00 1037.2
14:29:26 10 68.91 0.54 0.00 1001.3
14:29:36 10 69.48 0.77 0.00 1042.4
14:29:46 10 69.00 0.63 0.00 1013.9
14:29:56 10 69.44 0.48 0.00 1043.7
14:30:06 10 69.40 1.60 0.00 953.8
14:30:16 10 69.11 1.04 0.00 1041.3
-------------- ------ ------ ------ ----------
avg 55.74 20.66 12.14 994.0
max 72.89 71.07 71.07 1511.4
Could there be something that builds up, some kind of limit we’re hitting with queries etc? How would we test and find out?
Thanks