How Bad Are Geolocation Tools? Really, Really Bad
from the what-a-mess dept
Geolocation is one of those tools that the less technically minded like to use to feel smart. At its core it's a database, showing locations for IP addresses, but like most database-based tools, the old maxim of GIGO [Garbage In, Garbage Out] applies. Over the weekend Fusion's Kashmir Hill wrote a great story about how one geolocation company has sent hundreds of people to one farm in Kansas for no reason other than laziness. And yes, it's exactly as bad as it sounds.
Most people often aren't the most technically minded, give them a tool, tell them it CAN produce an output, and they'll assume that any output that looks like the best quality possible, IS the best one available. It's extremely common with 'forensic evidence' and jurors in court cases, where it's given weight well beyond its actual evidentiary value (to the point that they now distrust cases without it) – there's even a name for it, "the CSI effect", named after one of the TV shows that uses it as a cornerstone.
One of the latest tools to get the blind trust of morons is IP Geolocation. At its basic level, it's a database of IP addresses with latitude and longitude listed, so when you look up an IP address, you get a pair of coordinates you can associate as an 'origin' for that.
However, there's a number of problems with that.:
- First, what about those that don't have a lat/long listed?
- Secondly, how often are they updated?
- Third, how do they deal with cellular or 'mobile' devices?
So let's quickly address them.
Those that don't have a lat/long listed.
Well, there's a few ways to do it, but the way some chose to do, is just to guess. In the article that started me on this, it points out that the company MaxMind decided to guess at the average closest place it could – the geographical center of the US, except 39°50'N 98°35'W. is a messy decimal (39.8333333 N,98.585522W) so it rounded them to 38N, 97W. It's the front yard of a farm in Kansas.
Other times they just guess and get a town and put it somewhere there, although even that can be off a bit. It can be a lot off, as you'll see shortly.
How often are they updated?
There's no telling. With the great shortage of IPv4 addresses now, but with an ever-expanding list of devices, from cell phones to thermostats and even fridges, IP addresses are shifting around everywhere. There's also mergers and splits of companies, bankruptcies and so on. So unless the database is frequently updated, there's no chance that anything it has to say will be accurate – again we'll see that directly.
Finally, how does it deal with cellular devices?
Simply put, they don't. The handoff mechanism means that you'll often carry one IP address from one tower to the next (otherwise you'd have to terminate and restart any data transfer as you shifted between towers. In addition most cellular providers hide their cell customers behind NAT, precisely because of the lack of discreet IPv4 addresses to give out (and their… slowness in migrating to IPv6)
Odds are you're going to get a local network control center, or regional corporate office instead, which means it's practically no use at all.
Oh dear....
This all assumes as well that entries are made in good faith. One of the more common uses of geolocation is for targeted adverts, especially with 'adult websites', where they promise there's a horny woman (or man, if your browsing is detected as such, or the 'content' suggests you may be female) close by. Or you may have seen it in the scam adverts on news sites that should know better than to accept low-rate advertising based on scams (with easy to tell, clickbait headlines about insurance 'tricks' or similar).
This means that if you can 'rig' the database, you can expose the stupidity in parts of it, as was best demonstrated by Randall Munroe in his XKCD comic series.
So just how inaccurate are these systems? The easiest way to tell by far is to run some IP addresses where you know the location through these systems and see how far off they can be. So I did.
The most obvious one to start with is my own home connection's IP address. So I tried the link in the story, and boy was it off! Just for the record, I live on the south side of Atlanta's metro area, near Macon – Walking Dead country in fact
That's right, it put me in Ottawa, capital of Canada, roughly 1900km (1180 miles) and 1 whole country off. Part of that comes from the second question, how current the data is. It's listing my IP as belonging to Nortel networks. Problem is, I'm not a subscriber to Nortel – no-one is, the company was wound down years ago. Yet some databases still have them listed.
Cellphones don't fare much better either. I used the same service on a 4G Verizon phone sitting at my computer. It's location, San Diego. That's 1900 miles (3050km) off. Others services gave locations of New York, Atlanta, and Macon.
Wondering if it's just my semi-rural system that's messed up, I called a few friends who live in the Atlanta suburbs (a few streets from each other) and asked for their IP addresses, one used Comcast, and the other AT&T. Maybe things will be better and more accurate in a big-city environment?
I ran a number of different GeoIP services, and it was a very mixed bag of results.. One thing's certain though, none of the four set of coordinates gave an accurate location for the person (for obvious reasons I'm not going to give you their address, or mine for that matter)
Of them all, only one service – IPCIM.com – gave an error circle with a location, (twenty five mile radius), but it didn't do it for all. To me that indicates knowledge of its inaccuracy, but it's lack at other times seems to show it just doesn't care.
The second and third locations are the same coordinates, but they're less certain of the third than the second, despite both being off.
There's also something specific to note. There's 4 providers covered here. Two were done from the exact same location, yet their locations came nowhere near matching. Two more were IP addresses just streets away, but they also didn't match that well, although many went to the same default locations, including two which went to the 'lazy US Center' investigated in the Fusion piece.
More importantly, of the 30+ geolocating attempts made here, not a single one managed to be within a mile of the actual location (although one location was within a mile and a half, while another was within 3 miles – again, I'm not going to give out specifics). So for those who want to rely on them as being a source of where something is, the simple answer is "don't". This applies as much to those tracking down people who are leaving spammy comments, as it does to police officers and lawyers seeking to use them for court actions criminal or civil.
In fact lawyers and the police have absolutely NO excuse to use these kinds of databases in litigation at all as there are better, more accurate tools at their disposal – the courts themselves. In criminal cases a warrant is the preferred method, obtaining subscriber information from the ISP (fixed or cellular) which is far more accurate than any geolocation service because it's data coming from the entity actually providing the connection. In a civil trial you have a discovery subpoena to do pretty much the same thing and for the same reasons.
If you're doing it 'on your own', remember that these tools are as accurate as taking a dart and throwing it not at a map on the wall, but at a Google map display on your computer screen. Sure you'll be out a display, but you won't be potentially facing criminal charges when you go to act on what it basically bullshit data. At the very best, it can be used to advise, but it can be INCREDIBLY off, sometimes thousands of miles.
Data
The following services were used
- Checkip.org
- IP2location (Product: DB6, updated on 2016-4-1)
- IpInfo.io (Product: API, real-time)
- EurekAPI (Product: API, real-time)
- DB-IP (Product: Full, 2016-4-2)
- IPCIM.com
- MaxMind (Product: GeoLiteCity, updated on 2016-4-5)
- MaxMind (Product GeoIP2)
There were 4 IP addresses used, three residential and one cellular comprising four of the biggest ISP's in the US.
IP addresses
- 32.99.122 (Charter fixed line cable internet connection – K`Tetch)
- 193.166.88 (Verizon 4G cellular connection – K`Tetch )
- 137.147.28 (Comcast fixed line cable internet connection – James)
- 172.126.144.9 (AT&T gigapower fixed line internet connection, less than 6 months old – David)
The first two were located in south metro Atlanta, near Macon. David and James are located approximately half a mile apart in north Cobb county, Georgia.
Raw coordinates
Service |
Charter | Verizon | Comcast |
AT&T |
checkIP.org | 45.4167, -84.3246 | 32.7977, -117.1322 | NOT TESTED | BLANK RESULT |
IP2Location | 33.95621, -83.98796 | 32.55376, -83.88741 | 34.02342, -84.61549 | 34.02342, -84.61549 |
IPinfo.io | 32.8685, -84.3246 | 32.8975, -83.7536 | 34.0247, -84.5033 | 38.0000, -97.0000 |
EurekAPI | 32.8685, -84.3246 | 33.7981, -84.3877 | 34.1015, -84.5194 | 34.0247, -84.5033 |
DB-IP | 33.9562, -83.988 | 40.7128, -74.0059 | 33.9413, -84.5177 ("Marietta (bedroom)") | 33.8545, -84.2171 |
IPCIM.com | 32.8685, -84.3246 (± 25 mile) | NOT TESTED | 34.0247, -84.5033 | 34.0247, -84.5033 (± 25 mile) |
MaxMind (geoLiteCity) | 32.8685, -84.3246 | 32.8975, -83.7536 | 34.0247, -84.5033 | 38, -97 |
MaxMind (GeoIP2) | 32.8685, -84.3246 | 33.7844, -84.2135 | 34.0247, -84.5033 | 34.0247, -84.5033 |
If you'd rather see them on a map, they're here. (Legend Charter in green, Verizon in red, Comcast in blue, AT&T in yellow)
NOTE: One data source was extremely interesting in its provision of 11+ decimal places in its results. While this might seek to imply accuracy, it actually underscores how inaccurate it actually is. Eight decimal places gives a resolution of 1.1 millimeters – half the thickness of a CD/DVD. 11 decimal places as given in all their results is going to extremes, with locations given to less than a hair's thickness. It has been rounded down.
The "Marietta (bedroom)" label was actually on the output from their database.
I would like to thank David and James for their help with this. And for obvious reasons, we have forced changes in IP addresses for all our connections (and the release of this article was delayed to ensure that).
This is a repost from Andrew Norton's Politics & P2P blog
Filed Under: errors, geolocation tools, ip addresses