Using AI To Identify Car Models In 50 Million Google Street Views Reveals A Wide Range Of Demographic Information
from the you-are-what-you-drive dept
Google Street View is a great resource for taking a look at distant locations before travelling, or for visualizing a nearby address before driving there. But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation. That means they can provide all kinds of insights into how society operates, and what the differences are geographically. The tricky part is extracting that information. An article in the New York Times reports on how researchers at Stanford University have applied artificial intelligence (AI) techniques to 50 million Google Street View images taken in 200 US cities. Since analyzing images of people directly is hard and fraught with privacy concerns, the researchers concentrated on a proxy: cars. As an academic paper published by the Stanford team notes (pdf):
Ninety five percent of American households own automobiles, and as shown by prior work cars are a reflection of their owners' characteristics providing significant personal information.
First, the AI system had to be trained to find cars in the Google Street Map images. That's something that's easy for humans to do, but hard for computers, while the next stage of the work -- identifying car models -- is much easier using AI. As another paper reporting on the research (pdf) explains:
the fine-grained object recognition task we perform here is one that few people could accomplish for even a handful of images. Differences between cars can be imperceptible to an untrained person; for instance, some car models can have subtle changes in tail lights (e.g., 2007 Honda Accord vs. 2008 Honda Accord) or grilles (e.g., 2001 Ford F-150 Supercrew LL vs. 2011 Ford F-150 Supercrew SVT). Nevertheless, our system is able to classify automobiles into one of 2,657 categories, taking 0.2 s per vehicle image to do so. While it classified the automobiles in 50 million images in 2 wk, a human expert, assuming 10 s per image, would take more than 15 y to perform the same task.
The difference between the two weeks taken by the AI software, and the 15 years a human would need, means that it is possible to analyze much larger data collections than before, and to extract new kinds of information. This is done by using existing datasets, for example the American Community Survey, which is performed by the US Census Bureau each year, to train the AI system to spot correlations between cars and demographics. The New York Times article lists some of the results that emerge from mining and analyzing the Google Street Map images, and adding in metadata from other sources:
The system was able to accurately predict income, race, education and voting patterns at the ZIP code and precinct level in cities across the country.
Car attributes (including miles-per-gallon ratings) found that the greenest city in America is Burlington, Vt., while Casper, Wyo., has the largest per-capita carbon footprint.
Chicago is the city with the highest level of income segregation, with large clusters of expensive and cheap cars in different neighborhoods; Jacksonville, Fla., is the least segregated by income.
New York is the city with the most expensive cars. El Paso has the highest percentage of Hummers. San Francisco has the highest percentage of foreign cars.
The researchers point out that the rise of self-driving cars with on-board cameras will produce even more street images that could be fed into AI systems for analysis. They also note that walking around a neighborhood with a camera -- for example, in a smartphone -- would allow image data to be gathered very simply and cheaply. And as AI systems become more powerful, it will be possible to extract even more demographic information from apparently innocuous street views. Although that may be good news for academic researchers, datamining offline activities clearly creates new privacy problems at a time when people are already worried about what can be gleaned from datamining their online activities.
Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: ai, privacy, street view
Companies: google
Reader Comments
Subscribe: RSS
View by: Time | Thread
Is the project finally finished?
During the last months, Google captcha required up to several dozen mouse clicks identifying cars or traffic signs before it finally accepted that a user might be human - essentially, Google turned a considerable part of the world's population into mechanical turks to help with their project.
Is Google's project now finished, so we can back to clicking once or twice to prove we are not machines?
[ link to this | view in thread ]
[ link to this | view in thread ]
Re:
[ link to this | view in thread ]
There are lots of other proxies
Satellite dishes are similarly valuable. For example, the number of satellite dishes on a building is a proxy for the number of apartments, and the type (e.g. in the UK, Sky dishes vs larger hotbird dishes) shows the occupants' family origin.
Street trees and the type of local shops (most famously in the UK, a Waitrose) are good too.
[ link to this | view in thread ]
Dear Ai, you are not prepared.
[ link to this | view in thread ]
Significance?
[ link to this | view in thread ]
Didn't the lease game hit the USA?
In the UK looking at the cars use to be a great way to spot the difference between the good and bad areas.
These days areas I wouldn't want to walk through are full of leased white 4x4s , the area with 100000 miles junkers will normally be the better .
[ link to this | view in thread ]
Re:
[ link to this | view in thread ]
Re: Significance?
[ link to this | view in thread ]
Yes, they're immensely helpful when casing peoples homes and other property, planning heists etc., without ever appearing suspicious on video cameras in the process. Kudos to google for RobberyAid®
[ link to this | view in thread ]
Re:
Instead of whining about Google, maybe you could tell me why it's even remotely useful? I know this requires intellectual honesty and a desire to actually communicate, but maybe one of you people might be interested in such a thing.
[ link to this | view in thread ]
Re:
[ link to this | view in thread ]
Re: Is the project finally finished?
Just goes to show the orgs who got Captcha functionality could just be being used as well. If it cost anything to implement the Captcha, you'd be paying Google to make use of your users' judgement too.
Not sure if the org would think it was a fair trade. It would at least make their users/customers a bit more leery of Captchas, if not incensed that a verification system might be co-opting their eyes for Google's purposes.
[ link to this | view in thread ]
Re:
[ link to this | view in thread ]
Cars
> segregation, with large clusters of expensive and cheap
> cars in different neighborhoods.
High-end cars aren't a reliable indicator of income. There are plenty of hood-rats whose homes are a filthy squalor, with roaches running everywhere and malnourished kids sleeping on the floor, who nevertheless have top-of-the-line rides sitting in their driveways. It's all about what's important to people and for those dirtbags, the car matters more than anything else.
[ link to this | view in thread ]
Re: Re: Significance?
No, it's even easier to get it already collected from the DMV.
[ link to this | view in thread ]
Re: Cars
Moral judgments aside, this is about trends in areas. (Kind of like how BMI is good for populations, but terribly unreliable for individuals.)
[ link to this | view in thread ]
Re: Re: Re: Significance?
[ link to this | view in thread ]