I’m aggregating a data using IPv4 ranges, so clients might retrieve it later on. Usually clients send me an IP and expect to get the aggregated data in return. I’m just wondering what is the best strategy for implementing this on Aerospike?
For example, there are 4 ranges in DB: 10.0.0.0 - 10.255.255.255 with data A 10.10.0.0 - 10.10.255.255 with data B 10.10.10.0 - 10.10.10.255 with data C 10.10.10.13 -10.10.10.13 with data D
So if, user sends me IP 10.10.10.13, I should return [A,B,C,D]. If it something like 10.10.0.25 I should return just [A,B]
I’ve tried several approaches:
To store the ranges as I’d do it in RDBMS where each range falls into a separate row with 3 bins ‘ip_from’,‘ip_to’ and ‘ip_data’; with secondary index on ‘ip_from’ and UDF for filtering ‘ip_to’ filed. So if someone asks me what I’ve got for the IP a.b.c.d, I simply create the integer representation (INT_IP) of it and run an aggregation query with ‘ip_from’ between 0 and INT_IP and ‘ip_to’ >= INT_IP, aggregate the data and send it to the client. Well it works for small numbers or rows …
To split ranges on IP’s, store them in a single column with a secondary index and then retrieve the data by simple equals clause. The down side, is that the solution potentially requires a huge amount of space (will end up with 4B addresses + duplicated data).
Something in between, by making first 2 IP’s bytes the row key. This will give me up to 65025 rows, each with possibly 65025 bins. When I need to find an ip, I go to the row key and then pick up the bin with name similar to the requested IP and bin containing the data.
Of course, I may try to use 3 bytes as the row key and bucket of IP ranges in bins, but in this case I’d need to go over all bins to find out what is the exact range I’m falling in.
Might it be another, more optimized, solutions?