Nice approach. It reminds me of an approach I saw used to resolve coordinates to countries. Instead of loading all country polygons, the team created a bitmap and used colors to map each pixel to a country code. The bitmap wasn't super large and compresses pretty nicely in png format. This worked well enough and it dumbed down the country lookup to simply figuring out the color for a coordinate. Neat trick. And you could probably figure out if you are dealing with an edge case by simply looking at neighboring pixels and fall back to something more expensive if you hit one of those.
And of course with edge cases, there are lots of them but mostly it's fine. One case that comes to mind is that of the border town of Baarle-Nassau On the border with the Netherlands and Belgium. This village has some of the weirdest borders in the world. There are Belgian exclaves inside Dutch enclaves. In some cases the border runs through houses and you can enter in one country and leave in another. Some of the exclaves are just a few meters. There are a few more examples like this around the world.
Another issue is the fractal nature of polygons. I once found a polygon for New Zealand that was around 200MB that broke my attempts to index it. This doesn't matter of course for resolving country codes because it is an island. But it's a reason I implemented the Douglas Peucker algorithm to simplify the polygon mentioned in the article at some point.
The bitmap approach you describe allows for immediate (i.e. O(1) ) lookup of region by coordinate, which is pretty neat. Space-efficiency-wise, a bitmap (+ index that maps color to country) might not be the most efficient data structure, though, as there are more than 256 countries, so you already need 16 bits for each pixel instead of 8. Then, you have the additional complexity of if you actually want the bitmap to be viewable by humans, you need to make sure that the colors for neighbouring countries at least are sufficiently distinct.
Anyway, a Kotlin library I wrote uses a similar technique to make requests for the majority of locations immediate, while also handling the edge cases - i.e. when querying a location near a border.
https://github.com/westnordost/countryboundaries (also available in Rust)
What it does is to slice up the input geometry (e.g. a GeoJson) into many small cells in a raster. So, when querying for a location, one doesn't need to do point-in-polygon checks for potentially huge polygons, but just for those little slices that are in the cell one is querying for. And of course, if a country completely covers a cell, we don't even need to do any point-in-polygon check anymore. All this slicing is done in a preprocessing step, so the actual library consumes a serialized data structure that is already in this sliced-up format.
I needed it to be fast because in my app I display a lot of POIs on the map for which there is logic that is dependent on in which country/state the POI is located.
There are 249 countries with an iso code; so 8 bits might be enough. So it's not that bad. But even at 32 bits it would probably be fine and you could cram in some more data.
There are many similar things of course but nothing that was multiplatform, which I needed. I actually created a multiplatform kotlin library for working with language and country codes a few months ago: https://github.com/jillesvangurp/ko-iso
It seems we have some shared interests. I'll check out your library.
What you describe is nice strategy for indexing things. I've done some similar things. Another library (jillesvangurp/geogeometry) I maintain allows you to figure out which map tiles cover a polygon cover a polygon. Map tiles are nice because they are basically quad tree paths. I have a similar algorithm that does that with geohashes. You could use both for indexing geospatial stuff.
Slicing up the polygons sounds interesting. I've been meaning to have a go at intersect/union type operations on geometries. I added a boolean intersects recently to check whether geometries intersect each other. I already had containment check.
> There are 249 countries with an iso code; so 8 bits might be enough.
There are 249 ISO 3166-1 country codes:
* https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes
But 193 sovereign states recognized by the UN:
* https://en.wikipedia.org/wiki/Member_states_of_the_United_Na...
Some of the discrepancy can be accounted for by "legacy" codes like .su for the Soviet Union.
> But 193 sovereign states recognized by the UN
All of which are also in the 249 ISO 3166-1 list; it's a super set. It doesn't include the historical ones anymore. Codes for those are interesting if you have old data perhaps.
I remember seeing this technique in a video by Sebastian Lague: https://youtu.be/sLqXFF8mlEU?t=787
Really cool