A friend writes:

I was just buying some tickets and had to type the security word thing and as I was getting it wrong four times in a row learned that I was actually helping (or perhaps hurting by getting them wrong) to:

“Digitize books one word at a time by entering the words in the box, you are also helping to digitize books from the Internet Archive and preserve literature that was written before the computer age.”

I’ve spent my whole day trying to figure out how this possibly could work.

My guess is that the mistakes give them a similarity measure between letters in different fonts. For instance e’s are similar to a’s because people often mistake an e for an a and vice versa, but e’s are not similar to k’s since people rarely make that mistake. This means the mistakes are more useful to them than correct answers. But I’m not sure what the road is from a similarity matrix to OCR software. Suggestions?

About these ads