Storing/Querying IP


I’m aggregating a data using IPv4 ranges, so clients might retrieve it later on. Usually clients send me an IP and expect to get the aggregated data in return. I’m just wondering what is the best strategy for implementing this on Aerospike?

For example, there are 4 ranges in DB: - with data A - with data B - with data C - with data D

So if, user sends me IP, I should return [A,B,C,D]. If it something like I should return just [A,B]

I’ve tried several approaches:

  1. To store the ranges as I’d do it in RDBMS where each range falls into a separate row with 3 bins ‘ip_from’,‘ip_to’ and ‘ip_data’; with secondary index on ‘ip_from’ and UDF for filtering ‘ip_to’ filed. So if someone asks me what I’ve got for the IP a.b.c.d, I simply create the integer representation (INT_IP) of it and run an aggregation query with ‘ip_from’ between 0 and INT_IP and ‘ip_to’ >= INT_IP, aggregate the data and send it to the client. Well it works for small numbers or rows …

  2. To split ranges on IP’s, store them in a single column with a secondary index and then retrieve the data by simple equals clause. The down side, is that the solution potentially requires a huge amount of space (will end up with 4B addresses + duplicated data).

  3. Something in between, by making first 2 IP’s bytes the row key. This will give me up to 65025 rows, each with possibly 65025 bins. When I need to find an ip, I go to the row key and then pick up the bin with name similar to the requested IP and bin containing the data.

Of course, I may try to use 3 bytes as the row key and bucket of IP ranges in bins, but in this case I’d need to go over all bins to find out what is the exact range I’m falling in.

Might it be another, more optimized, solutions?

I am not able to give a direct answer. But see if this discussion (and the stackoverflow link there) can help you. Various techniques are discussed. Ip ranges/CIDR