I came across Aerospike a week ago, I read all the documentation and tested it. It has a quite different approach than any other NoSQL implementation I’ve used, but I’m satisfied with this database, great work (except I wasn’t expecting it to be limited by number of nodes, having a shared-nothing architecture).
We are designing an API that leverages geospatial Indexes heavily, so we are considered aerospike for its fast nature and for it’s geo indexes based on S2 (that I believe are very good). As documentation didn’t satisfied my concerns I’m asking about further info about the implementation.
Reading docs, all considerations I was able to do were:
- They are using S2, producing a CellID (integer) storing it in a SecondaryIndex.
- When they query an area, they produce the required S2 Cells and make few range queries on the SecondaryIndex.
- Secondary indexes are colocated with data and data is located according to he Key HASH, therefore queries are made toward ALL nodes.
Here my concern:
If the above is true, this mean that when querying for all points in a specific area there will be multipel range queries (due to S2 Cells produced to cover the area) and those will hit EACH SINGLE node in the cluster. I think that in a query intensive scenario this will become shortly a bottleneck and horizontal scaling will not help at all.
I’m i wrong? Are there other optimizations like S2 cell partitioning that helps improve the performances? Or can be a Geographic point (S2 cell) used as primary index, to partition data using CellID, preventing all nodes to be hit by range queries?
Thanks for your time.