How to query IP to Geo data from Aerospike


#1

Hi All,

We have IP to Geo dataset that contains range of IPs and the respective location. But I wonder how can we query this data as we do not have ‘like’ or greater than/less than operator.

Any thoughts on how can we determine the location from ip of the request.

Regards.


#2

Hi Dragon —

There was a question recently about using Aerospike for IP ranges. It’s over here:

Storing each IP address as a 32-bit number is practical in Aerospike, as that is only 4B keys thus the indexes would be only 256G of memory if fully populated — which is won’t be. An optimization of storing information by Class C yields only 4M keys, which requires only 0.25G of RAM. You can then use columns - Aerospike Bins - to store information on the entire class C, or individual IP addresses. As Aerospike uses a sparse representation, if you have only a few addresses in a class C, this is quite efficient. If you are doing a lot of reads, each read is a single KV lookup, touches a minimal number of servers, is extraordinarily parallel, and performs very well.

An alternate pattern is to think of the problem in a more relational way. If your geo information is supplied as a “network” (with a unique name and a range), you can insert a row per network, use the integer form of the IP address in a column, index that column, and look up using range queries — just like you would do in a relational or SQL system (but it’ll be parallel and distributed).

These general patterns are discussed in that other forum topic.

In either case, I suggest you use GeoJSON for the actual geo data — we’ll be adding support for GeoJSON, so you’ll be able to “reverse index” in a future release — ie, create an index on that column, and find IP addresses associated with a location — if you’re interested in that kind of feature. While this feature isn’t imminent, it is “soon”.


Ip ranges/CIDR
#3

thanks a lot for the reply… I referred to the thread link you pasted and also your response here…

truely, I am still a bit confused … would really appreciate if you can take an example…

like I was looking at the sample maxmind data i got from maxmind site - geolite city csv shows data like below

2001:200::,2001:200:ffff:ffff:ffff:ffff:ffff:ffff,42540528726795050063891204319802818560,42540528806023212578155541913346768895,JP,,,36,138,,0,0
2001:208::,2001:208:ffff:ffff:ffff:ffff:ffff:ffff,42540529360620350178005905068154421248,42540529439848512692270242661698371583,SG,,,1.3667,103.8,,0,0
2001:218::,2001:218:ffff:ffff:ffff:ffff:ffff:ffff,42540530628270950406235306564857626624,42540530707499112920499644158401576959,JP,,,36,138,,0,0
2001:220::,2001:220:ffff:ffff:ffff:ffff:ffff:ffff,42540531262096250520350007313209229312,42540531341324413034614344906753179647,KR,,,37,127.5,,0,0

I assume first two columns are regex for ip addresses (not sure abt v4/v6) and next two are the corresponding integer values. how do i store these?

assuming i get a request from ip address a.b.c.d how do i query aerospike for location (country state and city) Can you please illustrate taking an example of key and how can we query it? (apologies for the confusion)

thanks once again.

Regards.


#4

Hi, you’ve got IPv6 addresses there. Unfortunately, those don’t flatten to integers in a reasonable way (too big) - there’s a reason IPv4 is so small (32 bits).

In the case of your specific file, you can see that you’ve got a range, and that range is 2001:2xx — and all the addresses inside it — for each line. This is a /32, so we can see there’s an easy special case that will create super-quick lookups.

This shows an optimization pattern where you’d probably need to pass the addresses to a parser that gets you the /xx representation. If the /xx representation is /64 or less, then you can use the integer trick.

I’ve been reading up on the structure of IPv6 to recommend how to index them, haven’t found the right structure yet. Your continued posts would be cool — even if you show the data formats you’re getting from providers.

Of course you can always simply insert an individual address, but you’ve got ranges here. The IPv4 trick of reducing the high and low to integers and doing a range lookup doesn’t work.


#5

Thanks bbulkow

So for IPv6 if I understood correctly - we can keep the 2001:200, 2001:208, 2001:218 as the keys and the corresponding location as value in terms of bins as city state country… When we get a request from 2001:208.200.233.233.xx.xx then we take just 2001:208 from it and do a lookup i.e. the location is SG

Is that right?

Also can u pls elaborate the use case for IPv4 integer trick .taking example of key,value and query as I am unable to visualize.

I will share more info on the geo data as soon as it is available.

Regards


#6

Hi bbulkow,

Any update on this? Gentle reminder.

Regards, Dragon


#7

I don’t have further info on how to encode IPv6. Perhaps someone else does.