How Bad Are Geolocation Tools? Really, Really Bad
from the what-a-mess dept
Geolocation is one of those tools that the less technically minded like to use to feel smart. At its core it's a database, showing locations for IP addresses, but like most database-based tools, the old maxim of GIGO [Garbage In, Garbage Out] applies. Over the weekend Fusion's Kashmir Hill wrote a great story about how one geolocation company has sent hundreds of people to one farm in Kansas for no reason other than laziness. And yes, it's exactly as bad as it sounds.
Most people often aren't the most technically minded, give them a tool, tell them it CAN produce an output, and they'll assume that any output that looks like the best quality possible, IS the best one available. It's extremely common with 'forensic evidence' and jurors in court cases, where it's given weight well beyond its actual evidentiary value (to the point that they now distrust cases without it) – there's even a name for it, "the CSI effect", named after one of the TV shows that uses it as a cornerstone.
One of the latest tools to get the blind trust of morons is IP Geolocation. At its basic level, it's a database of IP addresses with latitude and longitude listed, so when you look up an IP address, you get a pair of coordinates you can associate as an 'origin' for that.
However, there's a number of problems with that.:
- First, what about those that don't have a lat/long listed?
- Secondly, how often are they updated?
- Third, how do they deal with cellular or 'mobile' devices?
So let's quickly address them.
Those that don't have a lat/long listed.
Well, there's a few ways to do it, but the way some chose to do, is just to guess. In the article that started me on this, it points out that the company MaxMind decided to guess at the average closest place it could – the geographical center of the US, except 39°50'N 98°35'W. is a messy decimal (39.8333333 N,98.585522W) so it rounded them to 38N, 97W. It's the front yard of a farm in Kansas.
Other times they just guess and get a town and put it somewhere there, although even that can be off a bit. It can be a lot off, as you'll see shortly.
How often are they updated?
There's no telling. With the great shortage of IPv4 addresses now, but with an ever-expanding list of devices, from cell phones to thermostats and even fridges, IP addresses are shifting around everywhere. There's also mergers and splits of companies, bankruptcies and so on. So unless the database is frequently updated, there's no chance that anything it has to say will be accurate – again we'll see that directly.
Finally, how does it deal with cellular devices?
Simply put, they don't. The handoff mechanism means that you'll often carry one IP address from one tower to the next (otherwise you'd have to terminate and restart any data transfer as you shifted between towers. In addition most cellular providers hide their cell customers behind NAT, precisely because of the lack of discreet IPv4 addresses to give out (and their… slowness in migrating to IPv6)
Odds are you're going to get a local network control center, or regional corporate office instead, which means it's practically no use at all.
Oh dear....
This all assumes as well that entries are made in good faith. One of the more common uses of geolocation is for targeted adverts, especially with 'adult websites', where they promise there's a horny woman (or man, if your browsing is detected as such, or the 'content' suggests you may be female) close by. Or you may have seen it in the scam adverts on news sites that should know better than to accept low-rate advertising based on scams (with easy to tell, clickbait headlines about insurance 'tricks' or similar).
This means that if you can 'rig' the database, you can expose the stupidity in parts of it, as was best demonstrated by Randall Munroe in his XKCD comic series.
So just how inaccurate are these systems? The easiest way to tell by far is to run some IP addresses where you know the location through these systems and see how far off they can be. So I did.
The most obvious one to start with is my own home connection's IP address. So I tried the link in the story, and boy was it off! Just for the record, I live on the south side of Atlanta's metro area, near Macon – Walking Dead country in fact
That's right, it put me in Ottawa, capital of Canada, roughly 1900km (1180 miles) and 1 whole country off. Part of that comes from the second question, how current the data is. It's listing my IP as belonging to Nortel networks. Problem is, I'm not a subscriber to Nortel – no-one is, the company was wound down years ago. Yet some databases still have them listed.
Cellphones don't fare much better either. I used the same service on a 4G Verizon phone sitting at my computer. It's location, San Diego. That's 1900 miles (3050km) off. Others services gave locations of New York, Atlanta, and Macon.
Wondering if it's just my semi-rural system that's messed up, I called a few friends who live in the Atlanta suburbs (a few streets from each other) and asked for their IP addresses, one used Comcast, and the other AT&T. Maybe things will be better and more accurate in a big-city environment?
I ran a number of different GeoIP services, and it was a very mixed bag of results.. One thing's certain though, none of the four set of coordinates gave an accurate location for the person (for obvious reasons I'm not going to give you their address, or mine for that matter)
Of them all, only one service – IPCIM.com – gave an error circle with a location, (twenty five mile radius), but it didn't do it for all. To me that indicates knowledge of its inaccuracy, but it's lack at other times seems to show it just doesn't care.
The second and third locations are the same coordinates, but they're less certain of the third than the second, despite both being off.
There's also something specific to note. There's 4 providers covered here. Two were done from the exact same location, yet their locations came nowhere near matching. Two more were IP addresses just streets away, but they also didn't match that well, although many went to the same default locations, including two which went to the 'lazy US Center' investigated in the Fusion piece.
More importantly, of the 30+ geolocating attempts made here, not a single one managed to be within a mile of the actual location (although one location was within a mile and a half, while another was within 3 miles – again, I'm not going to give out specifics). So for those who want to rely on them as being a source of where something is, the simple answer is "don't". This applies as much to those tracking down people who are leaving spammy comments, as it does to police officers and lawyers seeking to use them for court actions criminal or civil.
In fact lawyers and the police have absolutely NO excuse to use these kinds of databases in litigation at all as there are better, more accurate tools at their disposal – the courts themselves. In criminal cases a warrant is the preferred method, obtaining subscriber information from the ISP (fixed or cellular) which is far more accurate than any geolocation service because it's data coming from the entity actually providing the connection. In a civil trial you have a discovery subpoena to do pretty much the same thing and for the same reasons.
If you're doing it 'on your own', remember that these tools are as accurate as taking a dart and throwing it not at a map on the wall, but at a Google map display on your computer screen. Sure you'll be out a display, but you won't be potentially facing criminal charges when you go to act on what it basically bullshit data. At the very best, it can be used to advise, but it can be INCREDIBLY off, sometimes thousands of miles.
Data
The following services were used
- Checkip.org
- IP2location (Product: DB6, updated on 2016-4-1)
- IpInfo.io (Product: API, real-time)
- EurekAPI (Product: API, real-time)
- DB-IP (Product: Full, 2016-4-2)
- IPCIM.com
- MaxMind (Product: GeoLiteCity, updated on 2016-4-5)
- MaxMind (Product GeoIP2)
There were 4 IP addresses used, three residential and one cellular comprising four of the biggest ISP's in the US.
IP addresses
- 32.99.122 (Charter fixed line cable internet connection – K`Tetch)
- 193.166.88 (Verizon 4G cellular connection – K`Tetch )
- 137.147.28 (Comcast fixed line cable internet connection – James)
- 172.126.144.9 (AT&T gigapower fixed line internet connection, less than 6 months old – David)
The first two were located in south metro Atlanta, near Macon. David and James are located approximately half a mile apart in north Cobb county, Georgia.
Raw coordinates
Service |
Charter | Verizon | Comcast |
AT&T |
checkIP.org | 45.4167, -84.3246 | 32.7977, -117.1322 | NOT TESTED | BLANK RESULT |
IP2Location | 33.95621, -83.98796 | 32.55376, -83.88741 | 34.02342, -84.61549 | 34.02342, -84.61549 |
IPinfo.io | 32.8685, -84.3246 | 32.8975, -83.7536 | 34.0247, -84.5033 | 38.0000, -97.0000 |
EurekAPI | 32.8685, -84.3246 | 33.7981, -84.3877 | 34.1015, -84.5194 | 34.0247, -84.5033 |
DB-IP | 33.9562, -83.988 | 40.7128, -74.0059 | 33.9413, -84.5177 ("Marietta (bedroom)") | 33.8545, -84.2171 |
IPCIM.com | 32.8685, -84.3246 (± 25 mile) | NOT TESTED | 34.0247, -84.5033 | 34.0247, -84.5033 (± 25 mile) |
MaxMind (geoLiteCity) | 32.8685, -84.3246 | 32.8975, -83.7536 | 34.0247, -84.5033 | 38, -97 |
MaxMind (GeoIP2) | 32.8685, -84.3246 | 33.7844, -84.2135 | 34.0247, -84.5033 | 34.0247, -84.5033 |
If you'd rather see them on a map, they're here. (Legend Charter in green, Verizon in red, Comcast in blue, AT&T in yellow)
NOTE: One data source was extremely interesting in its provision of 11+ decimal places in its results. While this might seek to imply accuracy, it actually underscores how inaccurate it actually is. Eight decimal places gives a resolution of 1.1 millimeters – half the thickness of a CD/DVD. 11 decimal places as given in all their results is going to extremes, with locations given to less than a hair's thickness. It has been rounded down.
The "Marietta (bedroom)" label was actually on the output from their database.
I would like to thank David and James for their help with this. And for obvious reasons, we have forced changes in IP addresses for all our connections (and the release of this article was delayed to ensure that).
This is a repost from Andrew Norton's Politics & P2P blog
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: errors, geolocation tools, ip addresses
Reader Comments
Subscribe: RSS
View by: Time | Thread
[ link to this | view in chronology ]
Re:
My wife was actually a bit nervous about this piece, because we've had issues in the past (Anonymous members have tried to Doxx me, and so did Jeremy Hammond in an attempt to dissuade me from reporting a parole violation in 09). None of them even came close to where I live though, so it's not that big of a risk.
But yes, I should maybe do a followup using some of the VPN's and proxies I've got access to, see where they come out. Probably something I should get to after I come down from the high of BattleBots filming this next week, if anyone's interested?
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
[ link to this | view in chronology ]
$20 per car
[ link to this | view in chronology ]
Re: $20 per car
38N 97W is only 'geographic center' for the geolocation company, because they rounded down, and then offset another number (perhaps to see who's scraping their results?)
[ link to this | view in chronology ]
Re: Re: $20 per car
[ link to this | view in chronology ]
accuracy... ph yeah
Second, GEOIP generally isn't a tool of absolute precision. Rather, it's a good indicator. If your ISP has properly filled out it's information, if they are consistent about using the same IP blocks in one area (rather than over a wide network area), then you are very likely to see reasonable results.
Also, geo tool like these are NOT the best tools to use, because they do a lot of guess work on the way. Better tools are those that are based on user input. As an example, many dating sites use geo specific advertising to try to entire people into their sites. They use actual user signup data to general their own geo lists, honing the results based on that information and other geo tools (such as maxmind, which is usually not that far off the map).
So the end result, as an example, is that you 47.32 address returns reasonable information if you look more closely. It is assigned to Charter Communications (use to belong to an ISP in Canada, from what I could find), and they have "located" it to Lawrenceville, GA, which is likely where Charter has a regional hub, office, or perhaps a data switch point.
Now, is that accurate enough to say, land an ICBM in your driveway? Nope. Is it close enough to target a political ad to you? Yes. Is it close enough to target regional marketing? Absolutely. Local marketing? Maybe not quite as much.
It's still a pretty good tool overall, nowhere near perfect, but maintained and updated databases plus user provided data usually leads to pretty good guesses, more than enough to target advertising and get those ads in front of the right people most of the time.
[ link to this | view in chronology ]
Re: accuracy... ph yeah
[ link to this | view in chronology ]
Re: accuracy... ph yeah
A specific point with an error circle might actually be worse in this case, for cities near a border.
[ link to this | view in chronology ]
Re: Re: accuracy... ph yeah
it really is ludicrous.
[ link to this | view in chronology ]
Re: Re: Re: accuracy... ph yeah
With any internet connection, your modem will connect to some point of presence (PoP) for your ISP. The PoP is effectively the bridge between customer connections and the ISP's backbone. Each PoP will have a range of addresses that it can hand out to connecting customers. Unless you've got a static IP allocation, you could be handed out any address in that pool.
A large city may contain dozens of PoPs, due to the sheer number of customers each one needs to support. The distance from the PoP to where the customer is will probably be quite short, maybe a few miles at most. The location will be fairly accurate.
In rural areas, there will be fewer PoPs (due to their cost) covering much larger areas. You may have customers connecting in from 50+ miles away. Assuming that there's a PoP in Macon, you could have an IP address handed out to someone in Forsyth one week and Cochran (40mi away) the next. The location will be fairly inaccurate.
In order for a GeoIP database to give any sort of accuracy, it needs to know: (a) where the PoPs are; (b) the size of the area served, and (c) the range of addresses it can hand out. No public GeoIP system knows this information. The only public information is which ISPs own which blocks of IP addresses. Once an ISP owns a block of addresses, it can reallocate them wherever it likes, whenever it likes. It never needs to tell anybody else about these changes.
GeoIP databases work with whatever information they can get their hands on. All they know is that some point in time, some IP address is being used by some reported location. The rest is guesswork.
[ link to this | view in chronology ]
Re: Re: Re: Re: accuracy... ph yeah
[ link to this | view in chronology ]
Re: accuracy... ph yeah
[ link to this | view in chronology ]
Re: Re: accuracy... ph yeah
Now, if you were using a more commercial based system (say in part seeded with IP / city pairs from a credit card processor) you would see a lot more accuracy. Generally ISPs use IP blocks in one area (easier routing), and thus your IP block might turn up within a few miles, close enough for a Russian nuke.
Use the right tools, you get better answers. Use the wrong tools, and you are just a tool :)
[ link to this | view in chronology ]
Re: Re: Re: accuracy... ph yeah
And the first paragraph refers to a story where LOTS of people have assumed just that.
Me, I'm just quantifying how bad they are... as I say right at the start.
[ link to this | view in chronology ]
Re: accuracy... ph yeah
[ link to this | view in chronology ]
Re: accuracy... ph yeah
I think the conclusion here is that an IP as an identification tool (including everything besides Geo) sucks. And yet it is being used widely as if it's infallible evidence of whatever. So, yes, me receiving Romanian advertising because my browser spoofs my IP is ok but a Romanian being prosecuted and arrested because something wrong I did using his IP is quite problematic.
[ link to this | view in chronology ]
Re: Re: accuracy... ph yeah
Not at all. For the most part, an IP generally does a very good job (because of network design) is being able to track down to a single end connection point. In fact, it would be almost unreasonable to think that an ISP doesn't know where a connection is.
However, with the proliferation of wi fi networks, open wifi connections, TOR exit nodes, VPNs, and the like, it makes it harder to be truly accurate. However, if those services were required to maintain logs, the accuracy rate would be very high!
[ link to this | view in chronology ]
Re: Re: Re: accuracy... ph yeah
Not true these days, as due to IP V4 address shortages ISPs are now using NAT to share IP address, as well as dynamic address allocation. You need date, time and port as well for an ISP to locate an endpoint from their logs.
[ link to this | view in chronology ]
Walking Dead country
Now, don't be so pessimistic! The Hawks have a good chance of getting past the Celtics in the first round of the NBA playoffs, and they might even push the Cavs to, say, six games in the second round.
I mean, that is what you were referring to, right?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
I've been checking my own location
The closest it came was one time when it decided I was 20 miles away from my actual location.
[ link to this | view in chronology ]
Nortel Gets A Mention On This Site ...
[ link to this | view in chronology ]
To listen to the copyright trolls claim in their submission to the court and as part of their application to get expedited discovery from the ISP for the subscribers information they often (and I mean religiously) say how the geo location tools they use have a 99% accuracy in pin pointing the alleged infringer lives in this are and is served by so and so ISP.
I have always thought that this was baloney, and much like Andrew has pointed out in his article you should believe in the baloney too. If you look at the data Andrew included and his test he ran, there is a lot of hocus pocus in the trolls claims off 99% accuracy.
Not only have their been cases were they the trolls sued a small business because the geo location information led them to that IP address as being the infringer (mind you the business owner replied to the court with a WTF) and then promptly dismissed the business owner once it was pointed out to the trolls it was a business, but that points out how geo location tools are not as accurate as the trolls claim.
Yet to the courts as part of their boilerplate complaints you will see the trolls always always claim how their geo location tools are 99% accurate and their investigative methods point to the IP address of the ISP subscriber as the infringer.
It really is a joke at how much the courts are fooled by the B.S. the trolls fill their submissions to the court with in their complaint to get the whole extortion scheme going and the trolls continue to get away with this
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Am I Nukeable?
The basic idea is this:
1) Go to the locator service from your home computer;
2) Find out where it indicates you are;
3) If needed, use the Google distance measurement to find out how far the location is from your home.
If the distance is less than 1 mile (1.6 kilometers) then you are nukable: someone could look up your IP address and route a missile to you to finish you off.
(The 1-mile radius is that of the Hiroshima blast and was chosen for its historic value. Today, a Topol SS-25 missile from Russia has a radius of 7 miles, approximately, so you could use that if you prefer.)
The game is instructive, for two reasons: First of all, it teaches you about the breadth and depth of the data stored online for each and every one of us. It also teaches about the shoddiness of that data, because many people playing the game find they are not nukeable...which means if someone tried to nuke them, it would be a miss, which wouldn't help the people where it does land, one tiny bit.
(Such as that front yard in Kansas, destined to be a volcanic pit.)
[ link to this | view in chronology ]
Fun fact
Fun fact 2. Andrew have always been within 3 meters of the center of the observable universe!
Think about that the next time you attend a conference he attends.
[ link to this | view in chronology ]
Re: Fun fact
Also doesn't count the fact that yesterday I tromped up and down Stone Mountain (elv. 1680ft, and the redneck version of the quentulus quazgar mountains) which is a lot more than 3 meters; and you don't want to know about the flights to the UK, California, etc.
If you're going to do facts, know your facts first!
Of course all this presupposes that he earth is the center of the observable universe, which is a hypothesis, not a fact. For instance, it's a theoretical construct that we can see equally far in all directions, but all we really know is how far away certain objects that we can see are. And if we have objects that are further in a rough direction (say because all the scopes are better on that area - think north v south hemisphere) then the center is not 'the center'. And of course, if we were a bit like the planet Kriket, and had a massive dust cloud obscuring part of our view, that too would skew the center.
(claiming astronomical facts against someone that does astrophysics for fun, probably not the best idea either, doubly so
[ link to this | view in chronology ]
Re: Re: Fun fact
[ link to this | view in chronology ]
Yes, it's bad. It should be bad. Please stop normalizing this.
That has been all for this public service announcement.
[ link to this | view in chronology ]
Maxmind is a fraud of itself
I'm a firefighter and fire investigator, and yet incompetent Maxmind has ME blacklisted so we can't purchase radios for firefighting! ?????
I called up Powerwerx after my order was canceled, and powerwerx said that I was flagged as a high risk for fraud by Maxmind and I couldn't purchase anything over the phone either.
[ link to this | view in chronology ]