Wants to know why their data distribution on their cluster is distributed throughout the cluster.
To determine how data is distributed through the cluster, we will have to look at the cluster.
=== NAMESPACE === ip/namespace Master Objects aerospike1-12/test-ns 34,254,650 aerospike1-11/test-ns 32,121,480 aerospike1-1/test-ns 31,079,740 aerospike1-7/test-ns 31,075,307 aerospike1-9/test-ns 29,874,205 aerospike1-8/test-ns 29,120,584 aerospike1-10/test-ns 29,112,686 aerospike1-5/test-ns 28,685,034 aerospike1-6/test-ns 27,043,474 aerospike1-2/test-ns 26,793,969 aerospike1-3/test-ns 26,620,033 aerospike1-4/test-ns 25,924,147
To find out if this cluster data is distributed normally. We would need to find out the mean (average), the variance (average of the squared difference from the mean), and the Standard Deviation (a measure that is used to quantify the amount of variation or dispersion of a set of data values).
To find out the mean (average), we take the total Master Object in your cluster 351,705,309 and divide that by the number of nodes which is 12 in this example, which is 29,308,776. Next we calculate each node’s object difference and minus from the mean (average), square it, and then average the result which is 6,323,040,910,769.
Then the Standard Deviation is just the square root of variance, which is 2,514,566.
Now we can show which nodes are within one Standard Deviation (2,514,566), two Standard Deviation (5,029,131), to three Standard Deviation (7,543,697).
1st standard deviation = 2,514,566 = 68% within: 26,794,210 - 31,823,341 2nd standard deviation = 5,029,131 (2,514,566 * 2) = 95% within: 24,279,644 - 34,337,907 3rd standard deviation = 7,543,697 (2,514,566 * 3) = 99.7% within: 21,765,079 - 36,852,473
As you can see below, the distribution of data conforms to a normal curve (bell-shaped curve).
# of RECORDS # of NODES 26 M 1 # 27 M 2 ## 28 M 1 # 29 M 1 # 30 M 3 ### 32 M 2 ## 33 M 1 # 35 M 1 #
Additional info on Standard Deviation : http://www.mathsisfun.com/data/standard-deviation.html