Why ReCaptcha works: The 4 requirements of Crowdsourcing


If the 5 of you can’t decide where to eat lunch without saying the dreaded words “up to you”, how in world can 100,000 or even 1 million people combine together to solve a problem? The answer is through a mixture of the right collaborative systems enabled by the right technology and people with the right skill-sets motivated by the right things. Get all 4 components right, and you’ve got yourself the answer to any problem.

However, getting these 4 components right is very very tricky.Some might argue it’s more difficult than solving the task at hand, but a handful of successful crowdsourcing projects have broken the mold and set the tone for others to follow. Today I want to focus on what is quite possibly the biggest crowdsourcing project ever, and with the exception of wikipedia is probably the most successful as well. ReCaptcha.

A Captcha is a little bit of checking most websites do to make sure you’re a human. Now the reason they’re all jumbled up and ‘squigly’ is simply because the squigly-ness makes it impossible for a computer program to read. In fact no one has yet come up with a program that can read a ReCaptcha, yet even my 6 year old niece can be able to identify most ReCaptchas on the first try, which tells us a lot about the difference between man and machine.

No Computer programs means the process of filling up a Captcha can’t be automated, which means I can’t write a script to fill up 10,000 captchas over and over again. So if I were say signing up for a newsletter or trying to register for a free gift, I couldn’t get a program to sign up hundreds or thousands of times, the process is intentionally manual and slightly time-consuming.

Now however, the brilliant founder of Captcha, had an even more brilliant idea, he called it ReCaptcha.

Rather than wasting the 10-15 seconds it takes for humans to read and translate that squigly text into words. He decided to harness that power into what is undoubtedly one of the biggest crowdsourcing projects of all time. He decide to have humans all over the world digitize books for free.

To understand how it works though, you first have to understand Google Books. Google books is part of Google’s efforts to digitize every single book in existence, these include very old manuscripts that are either ink-smudged or written in old calligraphy. These imperfections make it almost impossible for a computer to digitize into digital form, but the imperfections look similar to the intentionally squiglly-ness of Captcha. So the creator of the Captcha, thought it would be fantastic to turn this into ReCaptcha.

In ReCaptcha you’re given 2 squigly words instead of 1. 1 word, the database knows and is testing your human-ness, just like before.

The other word was taken out of a printed text (let’s call it an unknown), this printed text is something Google computers can’t decode into words. So they’re given you a chance to help them. So it displays the unknown to you and gathers your input. Sometimes Google may display the same unknown to multiple people, and once a couple of people have responded in the same way, Google then digitizes the word into the answer. It’s Amazing.

You can actually see how good ReCaptcha is compared to a regular computer here.

Now why is ReCaptcha the biggest crowdsourcing project out there. According to this Ted Talk by it’s founder Luis Von Ahn. the total number of unique people who have helped digitize books stands at…(are you ready?) ….750 million, or roughly 10% of the worlds population.

That’s way ahead than the 50,000 people who helped look for Steve Fosset or even the 15.8 million wikipedians registered on wikipedia.

Projects like this address all 4 aspects of the crowdsourcing requirements.

The answer is through a mixture of the right collaborative systems enabled by the right technology and people with the right skill-sets motivated by the right things

The Right collaborative systems: ReCaptcha was fantastically simple, and broke down a hard problem to literally one word at a time and then manage to distribute that and re-compile easily.
The Right Technology: ReCaptcha utilized the internet and websites and that allowed it to reach a large audience in a short time. It widgets and add-ons allowed webmasters to add ReCaptchas to site easily.
People with the right skill-sets: Fortunately for ReCaptcha, almost any human being would qualify for it’s task. So that anyone with eyes can easily perform the task required.
The Right motivation: This is 2-pronged. So usually the users would be motivated to register online, or to receive a free-gift and were motivated enough to decode one word into text. However, that would never have happened if the web-masters and administrators didn’t feel the need to check for ‘humans’ or if implementing reCaptcha was hard. ReCaptcha is a rare breed to have got this part correct.

Without the combination of Systems, technology, skill-sets and motivation….none of this would have happened.

Unfortunately, though the latest news is that ReCaptcha was successfully cracked recently. As to how far this is true, I’m not sure :(.


Astound us with your intelligence