Why Google's Street View WiFi Data Collection Was Almost Certainly An Accident
from the technical-details dept
We've been among those who have believed that Google's collection of WiFi data via its Street View cars was likely an accident -- but some have argued that it is impossible to do such a thing by accident. In fact, in the various lawsuits and legal maneuverings around this mess, many people keep claiming that there's simply no way Google was accidentally collecting this data -- although we've yet to hear a single person explain what Google would possibly want with the data, or seen a single shred of evidence that anything was ever done with the data. However, for those who insist it is impossible to for this to have happened by accident, Slashdot points us to a detailed technical analysis of why it almost certainly was an accident, despite all the claims to the contrary.It explains, in great detail, how and why the collection of data packets would occur, mainly to help triangulate where the WiFi network was located -- something that Google has always admitted to doing. The problem was that some of the junk data (a very tiny amount, again, as explained in the article) got caught and retained, when it should have been dumped:
Although some people are suspicious of their explanation, Google is almost certainly telling the truth when it claims it was an accident. The technology for WiFi scanning means it's easy to inadvertently capture too much information, and be unaware of it.It then goes on to show how all of this works, using a specific example from within a Panera Bread restaurant that has open WiFi, which the author uses to demonstrate just how easy it is to capture stray data, why it would make sense and also just how useless most of that data really would be. It's pretty convincing, but I doubt it will satisfy the conspiracy theorists who are just absolutely positive Google had something nefarious planned.
The key issue, as has been pointed out repeatedly, is that most people arguing nefarious intent don't seem to understand what Google was actually doing. It was trying to map the location of WiFi base-stations, a perfectly legal activity that a small group of companies have been doing for years. But in order to best figure out the location of the networks, it's helpful to have as much data as possible that traversing over the access point. The system doesn't care or need to know what that data is, it just wants as much data as possible for the purpose of triangulating. The problem was that Google's system "kept" the data that it got, even though there's been no evidence presented that the the data was ever used for anything (a key point that those screaming "criminal intent" repeatedly gloss over). On top of that, no one even explains why Google would want such data. The little snippets would be so random it's difficult to come up with any reason why keeping such data would be useful.
Triangulation is a lot harder than you'd think. This is because many things will block or reflect the signal. Therefore, as the car drives buy, it wants to get every single packet transmitted by the access-point in order to figure out its location. Curiously, with all that data, Google can probably also figure out the structure of the building, by finding things like support columns that obstruct the signal.I agree with the conclusion to the post. Just because this was pretty clearly an accident, it still doesn't make it a good thing. Google clearly should have realized this much earlier and never allowed such data to be captured. But those running around screaming about how this was all pre-meditated by Google are going to have to offer up a lot more evidence.
What's important about this packet is that Google only cares about the MAC addresses found in the header, and the signal strength, but doesn't care about the payload. If you look further down in the payload [in the example data from an open WiFi network in Panera], you'll notice that it's inadvertently captured a URL.
Take a look again. Even though the access-point MAC address is highlighted, there's extra data in the packet. These extra data will include URLs, fragments of data returned from websites (like images), the occasional password, cookies, fragments of e-mails, and so on. However, the quantity of this information will be low compared to the total number of packets sniffed by Google.
That's the core of this problem. Google sniffed packets, only caring about MAC addresses and SSIDs, but when somebody did an audit, they found that the captured packets occasionally contained more data, such as URLs and e-mail fragments.
Filed Under: data collection, triangulation, wifi