MD5 – The hash algorithm is now Broken!!!

Posted by CRYPTOcrat | Encryption | Monday 5 January 2009 7:16 pm

Author:

This blog is authored by Aniruddha Shrotri a fellow CRYPTOcrat.  Aniruddha is the CTO and Co-founder of E-Lock. E-Lock specializes in Digital Signature & Electronic Signature Software Solutions. Aniruddha has been active in the PKI domain for a long time and in a great position to write a note about what this recent NEWS about MD5 means.

You can find more information about Aniruddha on his LinkedIn Profile.

Abstract

Here is wishing all the CRYPTOcratians a very Happy and Secure New Year! But I am afraid the New Year has brought with it a disturbing piece of news – the hash algorithm MD5 is broken – not just in theory but in practice too.

Here is a very recent and good article http://news.cnet.com/8301-1009_3-10129693-83.html that describes how the weakness in MD5 was successfully exploited to create a fake website with HTTPS that would pass the browser test. Since 2004 it has been known that a weakness existed in MD5 but this is the first time this weakness has been exploited to create a practical live demonstration.

Issues that are raised by this demo

Clearly MD5 is broken and the consequences of this can be quite grave. Here are a few points to ponder over this development:

  • The weakness was demonstrated by faking a certificate so that an SSL enabled fake website can successfully spoof a genuine website. This will mean that when people browse and they see the URL to contain HTTPS and/or a lock icon in the browser status bar, they will no longer be 100% assured that they are looking at a genuine website. If this assurance is gone, then how else is one supposed to verify whether one is dealing with a genuine or a fake website?
  • The resources required to create this exploit were not enormously large – 200 Playstations 3 for 2 weeks. This is certainly not beyond the means of a determined hacker. As the article mentions, this much computing power could probably be acquired for $1500 from Amazon! Consider this effort against the pay-off that a rogue hacker might have in terms of harvesting identity information of hundreds of thousand unsuspecting customers of important financial institutions like banks.
  • In faking a website, another issue that the hacker has to tackle is how to get victims to connect to the rogue website as opposed to the genuine website. There are a number of phishing attacks possible, starting from sending genuine looking fake emails with links that apparently go to the genuine website but the actually URL they have is of the rogue website. More advanced techniques include poisoning the DNS Cache of some common public DNS servers. This is very insidious as there is no genuine way in the hell the victim can make out that he or she is going to the fake website – even the correct URL will take him to the wrong website! Thirdly, the hackers can setup public Wi-fi access points and provide “open” and “free” network access to users. Users of such free Internet won’t even know that some of the links they are visiting are actually spoofs. In all, it is not very difficult to make unsuspecting victims go to a fake website (phishing attacks) and now it turns out it is not difficult either to give a false sense of comfort to victims by enabling the phished web sites with SSL/HTTPS.
  • Currently this attack utilizes weakness in MD5. Remember sometime back it was shown that SHA-1 too is weak (though nowhere near as weak as MD5). It is only a matter of time that similar attack becomes possible on SHA-1 also. This is a grave scenario as the only two hash algorithms in use in practice are MD5 and SHA-1. The new replacement for SHA-1, i.e., SHA-2 and SHA-3 is still a long way out. SHA-3 candidates challenge has only recently been thrown open. And practical implementations of them will take even longer to come.
  • This attack is not only about faking websites – it applies equally to Digital Signature applications as well. Basically, breaking of MD5 means being able to create another set of data whose MD5 hash is identical to the MD5 hash of some given data. This would mean that if I get a digitally signed document, I can create another document whose MD5 has matches with the MD5 hash of the original document and then I can simply transplant the digital signature from the original document to the new document. I can now take this new document with the proper digital signature of the original signer and the signer would never be able to repudiate having signed this new fake document. Imagine changing someone’s will, or changing the terms of a contract this way!
  • One might argue that it may not be worthwhile for an attacker to spend great resources to fake a single document (remember, MD5 hash for different documents would be almost unique, so in order to fake another document, the whole exercise of faking would have to be done again). This may not be true depending on the value of the document, but this is not even required. As demonstrated in this attack referred to in the above link, it is sufficient to create a fake certificate. The fake certificate can then be used to sign any number of documents without having to spend large resources – the resources would be required only to create fake certificate.
  • If a hacker fakes the certificate of an end user, he / she will have to spend again a similar amount of resources to fake each end user’s certificate he / she wants. However, if a CA’s certificate is faked, it can be used to issue fake certificates for any number of users without spending big resources for each end user certificate.

How was the fake certificate created?

A link is given in the article referred to above to a fake SSL enabled site, which the interested parties can see by clicking here: https://i.broke.the.internet.and.all.i.got.was.this.t-shirt.phreedom.org/. If you observe the SSL certificate chain, it seems the middle certificate in the chain is the fake one. That is, the root CA ”Equifax Secure Global eBusiness CA-1” has actually never issued a certificate to “MD5 Collisions Inc. (http://phreedom.org/md5)”, but if you look at the certificate chain, you would be lead to believe that it has. This is the fake certificate.

Question is how it was created. Obviously, they constructed the certificate to contain whatever they want and then transplanted the Equifax root CA’s signature from some other legitimate certificate to this fake one. This of course assumes that the MD5 hash of the to-be-signed part of the both the certificates was identical. But getting the MD5 hash of some exact data you want (the new fake certificate content) to be identical to some pre-determined value (the MD5 hash of the original certificate from which the signature was transplanted) is nearly impossible. So, it would be required to keep adding some extra data to the data you want and keep trying to match the hash. In the case of a certificate, such extra data can easily be put in a field called “certificate extension” which is pretty much allowed to contain anything including your cat’s picture. As long as the extension is not marked “critical”, the receivers are supposed to ignore any extension they do not understand.

So, I had expected to see some unknown extension in the fake certificate with potentially large random looking data, so that when this data was taken together with the rest of the certificate content, the hash would match some pre-decided hash. To my surprise, I found no such extension – all the information in the certificate is standard and well recognized by all browsers.

There is a good and detailed explanation of how the fake certificate was created if you follow the links given on the website. It is worth reading if you are interested in the details and it is explained in terms not far from what laymen would understand. Still it is very intricate in details. To summarize: they did not transplant signature from any old certificate that the root CA had signed – they specially constructed a real certificate, which they got the root CA to sign. They required the root CA to issue certificates with predictable validity period and serial number. They used the chosen prefix collision attack in which some prefix of the both the colliding data can be chosen by the attacker – this alone would probably rule out transplanting the signature from any old certificate onto the fake certificate. The major portion of collision block was absorbed in the RSA Public key modulus. Some “tumor” (as they call it) is visible in the fake certificate in the form of strange content of the “Netscape Comment” extension in the certificate.

What can be done

Having scared you enough with different things the hackers can do to make your life miserable, lets look into what can be done to alleviate the pain a bit:

  • All the CAs and all digital signature softwares should stop using MD5 as algorithm.
  • All CAs should revoke any certificates that used MD5 and re-issue certificates wih SHA1 or better algorithms.Note that this and the previous step actually may not help much. The reason is that the hackers wont be using the CA-issued certificate directly anyway – they will create a new fake certificate that LOOKS like being issued by the said CA and transplant the signature from a previously issued certificate that used MD5. Even though this particular attack required the team to create a new valid certificate that “looked like” the fake certificate they were going to create, it is conceivable that with some improvements, for transplanting, the hacker would just need a certificate issued by the CA that uses MD5 – does not matter whether it is expired, revoked or live.
  • The browsers should start issuing at least a warning when they find any certificate that uses MD5 in a trust chain during an SSL transaction.Given both the points above, it is a wonder that companies like Verisign and Microsoft have chosen to downplay this attack.
  • The digital signature softwares should all start doing the same during any signature verification – flag a warning at least if it detects use of MD5 in the signature or in any certificate up the trust chain.
  • Even though in this instance there was no “unknown” extension in the certificate, it would be prudent for the browsers and digital signature verification software to flag a warning if they find any certificate in the chain to contain an unknown extension (even though marked “not critical”) which contain suspect looking data (i.e., contains ASN Octet String or Bit String data types).

Please feel free to send in your inputs using the comments section below. Additionally you could also use our group’s discussion forum link to send in your comments for Aniruddha.

    3 Comments

    Please note the views expressed in the comments below are that of the commenter and the owners of this website may not agree with the views expressed.

    1. Comment by Vivek Athalye — January 10, 2009 @ 8:34 pm

      indeed its scary …

      In the last point u mentioned that applications should flag a warning if they find “unknown extension”… but even if such a warning is given to user, how will it help? end user like me, will be in no position to make any sense out of the contents of the extension.

      You would have seen lot of sites that show the screen shots of the warning dialogs (typically on download sites or if some activeX needs to be installed) and “guide” the user to continue the operation / process. What if a fraud site “guides” the user in same way to get rid of the warning dialog, how are we going to handle it??

    2. Comment by Aniruddha Shrotri — January 12, 2009 @ 11:26 am

      You have a point, but there is really no substitute for end user’s education. Software alone cannot address all the security concerns all the time. An example is spam detection software — typically it is able to detect what is definitely a spam, but occasionally has only a probabilistic idea about some mails and in that case it is left to the user to determine if he/she considers it spam. Browsers giving an alert for anything found suspect with the certificate will certainly warn the user that there is something potentially wrong with the certificate. In such a case the user should, based on external factors such as user’s familiarity with the website, the criticality of the content that the site is giving etc., decide for himself whether he wants to trust such a site. The applications could, actually given an error instead of warning, but that has the danger that potentially a very few minority of sites which use such genuine certificates might become inaccessible.

      Truely speaking, the problem is not so grave for SSL and websites, but is a little complex for signed documents. That is because, likewise, there is this issue of what should an application do if it finds MD5 (or for that matter MD2 or MD4) in a digital signature or certificate — should it only give a warning or fail the signature verification altogether? If it gave only a warning, one might have the same question — what should an end user do with a warning. On the other hand if it treats it as error, potentially a lot of genuine signed documents might lose their authenticity. I believe that just because an algorithm has been broken (or some key lengths have become weak), it would be too much of a penalty to pay if it rendered all information that uses such algorithms or keys useless. Again, here external factors should come into picture based on which the receiver of the information could decide whether to trust the information or not.

      To precisely address such needs, some standardization bodies are working on defining standards for long term archiving. Some signed documents need to be stored for a long time like 10, 20 or 30 years and one needs to ensure their authenticity at that time. What happens to signed documents over a period of time when the algorithms get broken or keys become weak. Essentially the standards talk about using trusted third party timestamps to clearly mark the time when the signature was done, and then re-signing periodically such documents with latest algorithms and key lengths which are not broken. However, such standards are far from being in deployment and until such times as these standards become ubiquitus, users will have to depend on some external factors to take a call about the trustworthiness of information.

    3. Comment by Vivek Athalye — January 12, 2009 @ 10:19 pm

      hmm… an interesting dilemma …
      to trust or not to trust, is the question!!

      well, thank you for explaining the pros and cons of showing warnings / errors.

      and i look forward to ubiquitous new standards and processes that will help the end user to trust genuine sites/documents without much efforts.

    RSS feed for comments on this post.

    Sorry, the comment form is closed at this time.

    soccerine Wordpress Theme