Using AI To Identify Car Models In 50 Million Google Street Views Reveals A Wide Range Of Demographic Information

from the you-are-what-you-drive dept

Google Street View is a great resource for taking a look at distant locations before travelling, or for visualizing a nearby address before driving there. But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation. That means they can provide all kinds of insights into how society operates, and what the differences are geographically. The tricky part is extracting that information. An article in the New York Times reports on how researchers at Stanford University have applied artificial intelligence (AI) techniques to 50 million Google Street View images taken in 200 US cities. Since analyzing images of people directly is hard and fraught with privacy concerns, the researchers concentrated on a proxy: cars. As an academic paper published by the Stanford team notes (pdf):

Ninety five percent of American households own automobiles, and as shown by prior work cars are a reflection of their owners' characteristics providing significant personal information.

First, the AI system had to be trained to find cars in the Google Street Map images. That's something that's easy for humans to do, but hard for computers, while the next stage of the work -- identifying car models -- is much easier using AI. As another paper reporting on the research (pdf) explains:

the fine-grained object recognition task we perform here is one that few people could accomplish for even a handful of images. Differences between cars can be imperceptible to an untrained person; for instance, some car models can have subtle changes in tail lights (e.g., 2007 Honda Accord vs. 2008 Honda Accord) or grilles (e.g., 2001 Ford F-150 Supercrew LL vs. 2011 Ford F-150 Supercrew SVT). Nevertheless, our system is able to classify automobiles into one of 2,657 categories, taking 0.2 s per vehicle image to do so. While it classified the automobiles in 50 million images in 2 wk, a human expert, assuming 10 s per image, would take more than 15 y to perform the same task.

The difference between the two weeks taken by the AI software, and the 15 years a human would need, means that it is possible to analyze much larger data collections than before, and to extract new kinds of information. This is done by using existing datasets, for example the American Community Survey, which is performed by the US Census Bureau each year, to train the AI system to spot correlations between cars and demographics. The New York Times article lists some of the results that emerge from mining and analyzing the Google Street Map images, and adding in metadata from other sources:

The system was able to accurately predict income, race, education and voting patterns at the ZIP code and precinct level in cities across the country.

Car attributes (including miles-per-gallon ratings) found that the greenest city in America is Burlington, Vt., while Casper, Wyo., has the largest per-capita carbon footprint.

Chicago is the city with the highest level of income segregation, with large clusters of expensive and cheap cars in different neighborhoods; Jacksonville, Fla., is the least segregated by income.

New York is the city with the most expensive cars. El Paso has the highest percentage of Hummers. San Francisco has the highest percentage of foreign cars.

The researchers point out that the rise of self-driving cars with on-board cameras will produce even more street images that could be fed into AI systems for analysis. They also note that walking around a neighborhood with a camera -- for example, in a smartphone -- would allow image data to be gathered very simply and cheaply. And as AI systems become more powerful, it will be possible to extract even more demographic information from apparently innocuous street views. Although that may be good news for academic researchers, datamining offline activities clearly creates new privacy problems at a time when people are already worried about what can be gleaned from datamining their online activities.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: ai, privacy, street view
Companies: google


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. icon
    Peter (profile), 17 Jan 2018 @ 11:36pm

    Is the project finally finished?

    "find[ing] cars in the Google Street Map images [is] easy for humans to do, but hard for computers, while the next stage of the work -- identifying car models -- is much easier using AI."

    During the last months, Google captcha required up to several dozen mouse clicks identifying cars or traffic signs before it finally accepted that a user might be human - essentially, Google turned a considerable part of the world's population into mechanical turks to help with their project.

    Is Google's project now finished, so we can back to clicking once or twice to prove we are not machines?

    link to this | view in thread ]

  2. This comment has been flagged by the community. Click here to show it
    identicon
    Anonymous Coward, 18 Jan 2018 @ 1:00am

    Seriously, give up. This website blows. It was ok for like a day or so, but nobody gives a shit.

    link to this | view in thread ]

  3. icon
    PaulT (profile), 18 Jan 2018 @ 1:25am

    Re:

    Well... bye. You won't be missed.

    link to this | view in thread ]

  4. identicon
    Pete Austin, 18 Jan 2018 @ 4:29am

    There are lots of other proxies

    My father always judged a neighborhood by its cars.

    Satellite dishes are similarly valuable. For example, the number of satellite dishes on a building is a proxy for the number of apartments, and the type (e.g. in the UK, Sky dishes vs larger hotbird dishes) shows the occupants' family origin.

    Street trees and the type of local shops (most famously in the UK, a Waitrose) are good too.

    link to this | view in thread ]

  5. icon
    Ninja (profile), 18 Jan 2018 @ 4:40am

    Dear Ai, you are not prepared.

    link to this | view in thread ]

  6. identicon
    Anonymous Coward, 18 Jan 2018 @ 4:50am

    Significance?

    Interesting, but how is the usefulness of this information much different from that of plain old automobile registration data?

    link to this | view in thread ]

  7. identicon
    steve, 18 Jan 2018 @ 5:13am

    "spot correlations between cars and demographics."

    Didn't the lease game hit the USA?

    In the UK looking at the cars use to be a great way to spot the difference between the good and bad areas.

    These days areas I wouldn't want to walk through are full of leased white 4x4s , the area with 100000 miles junkers will normally be the better .

    link to this | view in thread ]

  8. icon
    Vidiot (profile), 18 Jan 2018 @ 5:17am

    Re:

    Wouldn't that be a great bot post, instead of "Love what you wrote - thumbs up!"

    link to this | view in thread ]

  9. icon
    PaulT (profile), 18 Jan 2018 @ 5:17am

    Re: Significance?

    You can't get registration data by driving round the street taking photos.

    link to this | view in thread ]

  10. identicon
    Anonymous Coward, 18 Jan 2018 @ 6:04am

    "But Street View images are much more than vivid versions of otherwise flat maps: they are slices of modern life, conveniently sorted by geolocation."

    Yes, they're immensely helpful when casing peoples homes and other property, planning heists etc., without ever appearing suspicious on video cameras in the process. Kudos to google for RobberyAid®

    link to this | view in thread ]

  11. icon
    PaulT (profile), 18 Jan 2018 @ 6:41am

    Re:

    People like you claim this a lot, but I fail to see how photo that can be months/years old taken from the road outside would be of any real value to a would-be robber. Especially compare to, say, driving down the street and looking at the details that the Street View pictures could never show, or at least ensuring you have an up-to-date photo. Surely criminals would want to at least do a drive-by, rather than hoping that photo from 3 years ago isn't missing any new security installations for them to be surprised by?

    Instead of whining about Google, maybe you could tell me why it's even remotely useful? I know this requires intellectual honesty and a desire to actually communicate, but maybe one of you people might be interested in such a thing.

    link to this | view in thread ]

  12. identicon
    Anonymous Coward, 18 Jan 2018 @ 7:26am

    Re:

    Then why are you here?

    link to this | view in thread ]

  13. identicon
    Anonymous Coward, 18 Jan 2018 @ 8:01am

    Re: Is the project finally finished?

    I thought they were just being extra careful about bot IDing of car images to exploit Captchas. Now it seems potentially more sinister.

    Just goes to show the orgs who got Captcha functionality could just be being used as well. If it cost anything to implement the Captcha, you'd be paying Google to make use of your users' judgement too.

    Not sure if the org would think it was a fair trade. It would at least make their users/customers a bit more leery of Captchas, if not incensed that a verification system might be co-opting their eyes for Google's purposes.

    link to this | view in thread ]

  14. identicon
    Anonymous Coward, 18 Jan 2018 @ 10:02am

    Re:

    Congrats you’ve won the Out Of The Blue Award For Most Detatched From Reality Comment.

    link to this | view in thread ]

  15. icon
    btr1701 (profile), 18 Jan 2018 @ 11:37am

    Cars

    > Chicago is the city with the highest level of income
    > segregation, with large clusters of expensive and cheap
    > cars in different neighborhoods.

    High-end cars aren't a reliable indicator of income. There are plenty of hood-rats whose homes are a filthy squalor, with roaches running everywhere and malnourished kids sleeping on the floor, who nevertheless have top-of-the-line rides sitting in their driveways. It's all about what's important to people and for those dirtbags, the car matters more than anything else.

    link to this | view in thread ]

  16. identicon
    Anonymous Coward, 18 Jan 2018 @ 12:55pm

    Re: Re: Significance?

    You can't get registration data by driving round the street taking photos.

    No, it's even easier to get it already collected from the DMV.

    link to this | view in thread ]

  17. identicon
    Valkor, 18 Jan 2018 @ 3:06pm

    Re: Cars

    It's ok that you didn't read the linked articles. That's a lot of work. Did you even read the post here? The car image data was compared to EXISTING data, and it checked out.

    Moral judgments aside, this is about trends in areas. (Kind of like how BMI is good for populations, but terribly unreliable for individuals.)

    link to this | view in thread ]

  18. icon
    PaulT (profile), 19 Jan 2018 @ 1:44am

    Re: Re: Re: Significance?

    This is a proof of concept, and an end in and of itself. There's a lot more data to be collected overall than the DMV can provide, and you don't have to bother leaving a trail of requests of anything pesky like that. You can just collect publicly available metadata whenever you wish, and process that to tell you what you want.

    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.