Image Understanding & Web Security
Internet services offered for human use are suffering abuse by programs ('bots, spiders, scrapers, spammers, etc). We mount a defense against such attacks with CAPTCHAs, `completely automatic public Turing tests to tell computers and humans apart;' these are special cases of `human interactive proofs' (HIPs), a class of security protocols allowing people easily to identify themselves over networks as members of given groups. I will review the five years of evolution of HIP R&D, highlights of the first NSF HIP workshop, and applications of HIPs now in use and on the horizon. One of the best ways to construct a CAPTCHA is to exploit the gap in ability between humans and machines in attempting to read images of text. I will describe two such reading-based CAPTCHAs, developed in collaborations between PARC and UC Berkeley:
PessimalPrint, motivated by studies of physics-based image degradations, uses images synthesized pseudo-randomly over certain ranges of words, typefaces, and image quality; and
BaffleText, motivated by the psychophysics of human reading, uses image-masking degradations that seem to require Gestalt perception skills.
Both CAPTCHAs have been validated by experiments on human subjects and commercial OCR machines, and both have successfully resisted attack (so far) by advanced computer-vision techniques. I'll offer proposals for an image understanding research agenda to advance further the state of the art of web security.
[Joint work with Richard Fateman, Allison Coates, Kris Popat, Monica Chew, Tom Breuel, & Mark Luk.]
Dr. Baird is a Principal Scientist and manager of the Statistical Pattern & Image Analysis research area at the Palo Alto Research Center, a subsidiary of Xerox. He has published three books and sixty-five technical articles, and holds seven patents. He has taught at Princeton and UC Berkeley, and is Fellow of the IEEE and of the IAPR. With Manuel Blum of CMU, he organized the 1st NSF Int'l HIP Workshop, held at PARC in January 2002.