Making CAPTCHAs Productive

from the now-there's-a-good-idea dept

Thu, May 24th 2007 8:10pm — Mike Masnick

About five years ago, Louis von Ahn was the PhD. student who came up with the idea for CAPTCHAs, the little requests to "type this" before you could fill out a form or sign up for a service. These days, of course, such CAPTCHAs have become nearly ubiquitous. Since then, Ahn has gone on to create other online systems that figured out ways to shift labor resources to users, such as the ESP Game, which is designed to make image search much more effective (and which Google eventually licensed). However, it seems that Ahn has switched his attention back to CAPTCHAs after recognizing what a productivity drain they must be. The nice thing about the ESP Game is the end result benefits image search. CAPTCHAs only help weed out spammers and scammers. However, John writes in to let us know that Ahn's latest work is about making CAPTCHAs useful. What he's done is made it so the text that users have to type are scans from books or other printed materials that are being scanned by Brewster Kahle's Internet Archive project. That way, each time people are simply trying to enter a comment on a website, they're also helping to turn a scanned word into text for the Internet Archive. Of course, if someone were really sneaky, they would just do the same sort of thing and hook it up to Amazon's Mechanical Turk and keep all the earnings. Every time someone entered a comment on a site, it would earn you money. So, if anyone wants to do this, please reserve a cut for me.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

9 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

scottbp, 24 May 2007 @ 8:49pm

A nice idea but...
The whole idea of using CAPTCHAs to
shift labor resources to users
seems to me to be a crazy UI design decision.

Personally when I am designing sites and applications I try to increase usability and decrease cognitive load for the user. This seems backward to me. It also seems counter productive to put more barriers in front of a user right when we are already asking them to search their memories for sign in, or registration type information.

The ESP game works because it is a game, and produces useful info as a by product. This new CAPTCHA scheme takes something that we should already doing on the application side (identifying robots) and then makes it even more complicated. I think we should be decreasing use of tools like CAPTCHAs not adding more complexity to the system.

Of course I think this idea is quite ingenious, i just don't want to use it on any of the sites I design.
[ link to this | view in thread ]
ChurchHatesTucker, 24 May 2007 @ 9:12pm

OK, I may be just slow, but...
Isn't the whole point of a captcha that the text is (a) known, and (b) hard for an OCR program to decipher?

If (a) is true, what's the point?

If (b) is true, how is the challenging system going to know if the response is correct?
[ link to this | view in thread ]
Ajax 4Hire, 24 May 2007 @ 10:04pm

I had forgotten about the CAPTCHA,
thanks for the article to remind me the name.

In the early 21st century these were used to help site owners distinguish between human and machine. The only problem was that graphics engines, facial recognition and increased computing Zs (archaic term for MHz/horsepower) allowed for good sometimes even better simple character recognition than the human.

Consider the problem of "recognizing" 1(one) and l(ell);
Upper case letter entered in as lower case, zer0 and Oh.

CAPTCHA was transcended by similar techniques that required turing style test to gestalt the GIF/JPG/MPG/264.

Next used were images/pictures with
a question of "what is this?" answer: flower
Moving beyond the text based recognition to simple images.

But the CAPTCHA was a minor irritation, the image recognition was more frustrating, multiple valid answers (like the zer0/Oh) caused more ire directed at the site.

A short lived attempt was tried to use near current event questions similar to the World War II "Who won the World Series last year?" questions. A query that only a human or someone on your side would know.

CAPTCHA and Image queries were followed by secondary email authentication; a user must provide an email address and respond to THAT. This also proved to be relatively easy to overcome as machine generated email and email filtering/recognition was advanced enough to parse the query and provide the appropriate response.

There were also some short lived attempts to valid thru the exchange of fractional currency (Microsoft, eBay/Paypal, Oracle all tried Bank/CreditCard/Currency based checks on the assumption that only a human was too stupid to give up access to a currency exchange account).

By the early teens (2017 uwantwat.com is probably the best early example), sites became indistinguishable from human response in turing test. In fact, the best false positive test (machine passes as human) was summed up in the statement:
"to human is to err."

Turing test started using statistical expectation of a slightly wrong answer. but again the basic problem is a
machine is trying to authenticate real human response.
Given sufficient access to the machine, you can craft a complement machine to give the expected response.

Read your history books, its all in there.
[ link to this | view in thread ]
Anonymous Coward, 25 May 2007 @ 5:28am

"If (b) is true, how is the challenging system going to know if the response is correct?"

I totally agree!
[ link to this | view in thread ]
JBB, 25 May 2007 @ 7:01am

Knowing if the response is correct...
The system uses two words in the CAPTCHA. The first is a known word. The second is one the OCR didn't recognize. If the first one is entered correctly, the system knows you're a human. It then records the second one and compares that answer with other people's answer and if enough agree it decides that's the unOCRable word.
[ link to this | view in thread ]
Matt, 25 May 2007 @ 10:46am

Re: OK, I may be just slow, but...
that's what i was about to say...

CAPTCHA is meant to match up text (user input) to text in an image, meaning that the text is already known.
[ link to this | view in thread ]
Jim Schrempp, 25 May 2007 @ 3:30pm

ESP game is routinely hacked
A while back I played the Google implementation of The ESP Game and found that it was being hacked. I believe robots meet in it and pollute the results. I documented it all at:

http://www.jimschrempp.com/features/computer/googleimagelabeler.htm

I'd enjoy hearing other's opinions.

Jim
[ link to this | view in thread ]
karry, 1 Dec 2008 @ 4:04am

Re: ESP game is routinely hacked
If the first one is entered correctly, the system knows you're a human. It then records the second one and compares that answer with other people's answer and if enough agree it decides that's the unOCRable word. from laptop battery
[ link to this | view in thread ]
bepureme (profile), 14 Feb 2022 @ 10:27pm

CAPTCHA is meant to match up text
[ link to this | view in thread ]