Bump? Am I allowed to bump? Just need some guidance on the right way to solve this.
I also found (what’s below).
This this a better solution? I actually understand this as opposed to what I posted above.
I wonder how aerospike would fare?
As I say, I have 18m records which is gonna balloon doing it the /24 way. I’ll post my findings and my code in PHP.
Would be nice to hear from someone so I’m not talking to the void.
The approach we use for fast Geo-IP resolution is take all the IP ranges and break them at the /24 (the first three quads), and store a record holding all the matches in those addresses. This gives you 16 million keys and O(1) access. If you'll tolerate the client-side complexity of breaking up the stored record, it's performant without taking up lots of RAM.
In more detail:
take all ranges, and break them by their first 24 bits.
The range 128.100.60.0-128.100.60.9 becomes one record, <128.100.60 | 0 9 | (...recA...)>
The range 128.100.60.10 - 128.100.62.80 would become <128.100.60 | 10 255 | (...recB...)>, <128.100.61 | 0 255 | (...recB...)>, and <128.100.62 | 0 80 | (...recB...)>.
combine all the records with the same prefix into a hash whose key is the top of its range. So
key 128.100.60: {9: {...recA...}, 255: {...recB...}}
key 128.100.61: {255: {...recB...}}
key 128.100.62: {80: {...recB...}, ...}
To retrieve a specific IP, retrieve the compound record by its 24-bit key, and return the first result whose sub-key is larger than the last part. If I looked up 128.100.60.20, I would find that 9 was not larger, but that 255 was, and so return recB.
This is a common strategy for doing range joins (even spatial joins!) in things like Hadoop: partition on some reasonable chunk, and then index on one end of the range.