Google Embracing Unintentional Crowdsourcing
from the sneaky-bastards dept
I'm always fascinated by businesses that are built around incentives to multiple parties, where all those incentives align even if not everyone participating is aware of it. One of the best in the business at coming up with such things is Luis von Ahn, who has done research for many years on creating systems online that get other people to do some kind of "work" for you. After the CAPTCHA concept, probably von Ahn's most famous concept is the ESP Game, which is a game that helps to get much more detailed info about what's in any image by having multiple play a game to name what's in an image. People get more points if they match up keywords faster, encouraging them to be as accurate as possible in defining the key characteristics in an image.Last year, von Ahn went and gave a talk at Google, after which he licensed the concept of the ESP Game to Google (though, Google's version was too boring to get very much attention). However, it appears that the folks at Google did pick up a few additional lessons in how this concept works. Paul Kedrosky points us to the news that Google is admitting its GOOG-411 project has little to do with taking on the 411 telephone information service, and everything to do with building a better speech recognition system. You see, to build speech recognition, you need many different voices saying many different phonemes (the sounds that make up words) in a variety of accents/tones/pitches/etc. Rather than go out and ask people to speak, Google gets plenty of phonemes just by providing this service.
Cynics may call this exploitation or sharecropping, but it's nothing of the sort. It's giving something of value to get something of value -- even if not everyone is fully aware of what the exchange really is about. Too many people seem to think the idea of "crowdsourcing" is really just about getting the crowd to do work for you -- but that's not it at all. It's about setting up incentives so that everyone involved gets value in some form or another, making it a beneficial transaction to everyone.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: 411 service, crowdsourcing, speech recognition
Companies: google
Reader Comments
Subscribe: RSS
View by: Time | Thread
GOOG-411?
This is an awesome idea!
[ link to this | view in thread ]
Re: GOOG-411?
[ link to this | view in thread ]
I have been under the impression that they were offering premium positioning for certain key phrases through the service much like they do through Adwords.
"What city & state, What business name or category" at which point it lists off a number of related business. Pizza for example in my location always lists Papa John's first.
[ link to this | view in thread ]
Re:
[ link to this | view in thread ]
Its pretty common....
Look at any company running services, pay or free, and you'll find the same things happening on the back-end. Do you really think walmart/target/etc tosses away all the point-of-sale info, AT&T all the call data, comcast data from its converters, even some McDonald's fountain machines collect usage data.
And behind every data collection point there is someone else extracting other data. I used to work for a company that combined things like grocery value card transactions with insurance data and car sales, so they could figure out the optimum place to put gas stations (or where to send marketing data, or put up a billboard).
Its pretty common method of collecting data for researching future products. If you have a 'crowd' using a service there is always going to be someone manipulating it for gain.
[ link to this | view in thread ]
ESP Game
Let's get real here Google, a huge number of images on the web don't need labelling or can't be given meaningful/usefull labels. How many hundreds of thousands of meaningless pictures of spreadsheets or dialogs are there? How many meaningless pictures of factory floors are there? The answer to both, and similar questions, is way too many and that's the kind of garbage I saw during most attempts.
If you want to label images on the web and enlist the help of others you need only do three things: 1) identify labelers by expertise and use that 2) ask for their help 3) aid the labeler in helping you
How do you do that? Guess who's looking for images of machinery, of art and paintings, of celebrities, etc. - people who are interested and, frequently, knowledgeable about those subjects. Ask them to help as they search. Let them specify a subject and contribute their knowledge.
Heck, I've had many a boring day that I sat searching through Google images trying to locate something I wanted and would have gladly marked images as irrelevant or provided accurate labels in order to help others. I couldn't do that on 90+% of the images Google asked me to label, but I could do that on the majority of the images that come up in my searches.
[ link to this | view in thread ]
Tellme
[ link to this | view in thread ]
Luis' Experiment
At one point in class, he made us call a phone number and then say 1 to 10. That's it.
Two months later, I heard my friend's voice in a audio captcha in the recaptcha.
Good stuff.
[ link to this | view in thread ]