A Low-cost Attack on a Microsoft CAPTCHA

Jeff Yan and Ahmad Salah El Ahmad

Abstract. CAPTCHA is now almost a standard security technology. The most widely used CAPTCHAs rely on the sophisticated distortion of text images rendering them unrecognisable to the state of the art of pattern recognition techniques, and these text-based schemes have found widespread applications in commercial websites. The state of the art of CAPTCHA design suggests that such text-based schemes should rely on segmentation resistance to provide security guarantee, as individual character recognition after segmentation can be solved with a high success rate by standard methods such as neural networks. In this paper, we analyse the security of a text-based CAPTCHA designed by Microsoft and deployed for years at many of their online services including Hotmail, MSN and Windows Live. This scheme was designed to be segmentation-resistant, and it has been well studied and tuned by its designers over the years. However, our simple attack has achieved a segmentation success rate of higher than 90% against this scheme. It took ~80 ms for our attack to completely segment a challenge on a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM. As a result, we estimate that this Microsoft scheme can be broken with an overall (segmentation and then recognition) success rate of more than 60%. On the contrary, its design goal was that "automatic scripts should not be more successful than 1 in 10,000" attempts (i.e. a success rate of 0.01%). For the first time, we show that a CAPTCHA that is carefully designed to be segmentation-resistant is vulnerable to novel but simple attacks. Our results show that it is not a trivial task to design a CAPTCHA scheme that is both usable and robust.

Draft research paper [PDF]

ACM CCS'08 version [PDF]

Frequently Asked Questions

Q. Who was responsible for this research?
A. This project is joint work by Jeff Yan and Ahmad Salah El Ahmad, both at the School of Computing Science, Newcastle University, England.

Q. Are your programs or source code available?
A. Due to the sensitive nature of this research, we have not released programs or source code at this time.

Q. Have you notified Microsoft about these vulnerabilities? How did they respond?
A. We notified Microsoft the weakness of their CAPTCHA in Sept, 2007. As requested by them, our paper was held confidential until now (10 April, 2008). Some feedbacks from Microsoft on our work:

"...in an effort to show our appreciation for the hard work that you and your colleagues perform in helping us keep our online services products and customers safe, Microsoft has developed a New Security Researchers Acknowledgement Website for Microsoft Online Services.

The new website formally acknowledges Security Researchers that responsibly submit Security vulnerabilities found within Microsoft Online Services Products and applications. Because you responsibly submitted this case to us, we would like your permission to place your name, company, or alias on our new site."

Q. Who else was aware of your attack last year?
A. Luis von Ahn was briefed on our attacks and results when Jeff was visiting him at CMU in Oct, 2007.

Q. Are there other CAPTCHAs vulnerable to your attack?
A. The CAPTCHA deployed at Yahoo until March 5, 2008 was vulnerable to a variant of our attack.

Q. How is this different from other attacks, if any?
A. It's reported in Automated Automated crack for Windows Live captcha goes wild (The Register, Feb 8, 2008) that a surge of spam being sent from Windows Live accounts was observed, and a bot was analysed by a security firm to understand what was behind this phenomenon. However, in this reported case, the captcha decoding was not done by the bot, but at a remote server. It's unclear whether there was cheap human labor behind the scene feeding captcha answers manually. On the other hand, even if an automated attack was launched by the server, to date, no technical detail of this attack has been revealed at all.

Our method was implemented and tested in the summer of 2007, being the first effective attack on the Microsoft CAPTCHA. Together with a recognition engine (which is a standard technique), our attack can lead to a success rate of higher than 60% for breaking the Microsoft CAPTCHA. However, the success rate observed for the bot, as in the above report, was about 30-35%.

From an academic angle, the most interesting bit of our attack is probably the following. The widely accepted "segmentation resistance" principle in CAPTCHA design was in fact established by the team that designed the Microsoft CAPTCHA. Therefore, even if segmentation resistance is a sound principle, the devil is in the details.

Q. Have you broken other CAPTCHAs? Have you discovered other interesting attacks?
A. We have also broken the latest scheme that Yahoo has deployed at its global web sites since March, 2008. Our attacks on Yahoo CAPTCHAs are discussed in a recent manuscript, entitled "Is cheap labour behind the scene? - Low-cost automated attacks on Yahoo CAPTCHAs". The manuscript is not released yet (an abstract is here), but one copy was already sent to Yahoo.

We reported in our ACSAC'07 paper, entitled Breaking Visual CAPTCHAs with Naive Pattern Recognition Algorithms, a "pixel count" attack that cracks CAPTCHAs by simply counting the number of foreground pixels in each challenge image. This attack has broken quite a few schemes with almost 100% success, including almost all the schemes provided at captchaservice.org (a web service dedicated for CAPTCHA generation).

Some other interesting attacks are forthcoming.

Q. Why do you do all this?
A. We believe that CAPTCHA will go through the same process of evolutionary development as cryptography, digital watermarking and the like, with an iterative process in which successful attacks lead to the development of more robust systems.

We are responsible security researchers. Our common practice is that we inform our results to vendors (such as Microsoft and Yahoo), and allow them ample time to fix the vulnerabilities we have identified before we make our papers publically available. Essentially, we are doing free consultancy for the vendors to improve their security systems!

Q. Who is funding this research?
A. This is part of our ongoing project: Secure and usable CAPTCHAs. Ahmad Salah El Ahmad is supported by a prestigious Overseas Research Students (ORS) Award and scholarships from both our school and university this year. We are looking for funding to support Ahmad's PhD study on CAPTCHA, a young but important topic, in the coming years. Please contact us if you would like to offer support/help. More papers generated in this project are in the pipeline.

Please send Questions or Comments to Jeff Yan.

University of Newcastle, Computing Science

Lab of Security Engineering (LSE) @ Newcastle