I am working on a problem. We have a large set of data which contains 4 million records. Saying we have 2 bins naming basic_salary and gender staying in the same set. Each bin has 4 millions rows. Now, I want to calculate the percentiles of basic salary where gender is Male or female. One of the tricky requirement is to compute the percentiles in real time running performance like saying 1-2 seconds.
I went through the Aerospike document and I have a feeling that MapReduce with Clustering system might be the answer. However, I have no idea how to build that kind of system because I have worked with Aerospike for just few days.
Any body has the same experience can help me to figure out the solution for my real problem?
I highly appreciated that.