Cryptographers (the people who make and break codes) have told us not to use the MD5 algorithm for twenty years. Why are we still using it?
MD5 and SHA-1 are the two hash algorithms that are used to verify the authenticity of digital evidence. Both algorithms have security issues but SHA-1, which is more popular now, is clearly the stronger of the two. While it makes sense for products to support MD5 for backward compatibility, there is never a good reason to use MD5 for authenticating new evidence.
There are actually security properties that we care about here. The first is that it should be hard to find any two files that have the same hash (digital fingerprint). The second, is that it should be hard to find any file with the same hash value as a specific known file. Conceptually, the first case is like me walking into a room and asking everyone for their birthday to see if I can find any two people with the same birthday. The second case is like me doing the same thing but only looking for someone with my birthday. It's hard to find a match for a specific file with a known hash than to find any two files that match. In forensics, we're usually dealing with the harder case (a specific match).
MD5 is vulnerable to a theoretical attack in the harder case but not a practical one. MD5 is vulnerable to practical attacks against the weaker property.
If MD5 isn't vulnerable to a practical attack that undermines the way we're using it, why do we care? Two reasons. First, attacks only get better. Cryptographers warned us in 1996 that it had weaknesses and didn't get a practical attack until 2004. The attacks against MD5 have gotten markedly better since then. We don't know when we're going to get a practical attack that undermines the other security property, but it seems likely that we will get one given past progress.
The second reason is that most forensics examiners don't understand the difference between the two properties and can't intelligently defend their use of MD5. Hashing is used to provide assurance and a veneer of reliability to digital evidence; it's considered a best practice; and, it's required for self-authentication in the proposed amendments to the Federal Rules of Evidence. Given all this, why would we use an algorithm (formula) that cryptography experts recommend against instead of using one that they consider secure. Forensics experts aren't cryptographers and shouldn't try to rely on their own (non-existent) cryptography expertise to justify using an algorithm with known problems. When asked to defend the use of MD5, may forensics experts will point to the fact that the odds of a collision between two random files is 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456. Given those odds, I should have a better chance of winning Powerball for the second consecutive time on same day that I get bit by a shark and struck by lightning. In other words, it's practically impossible for two files to ever have the same hash. It just can't happen. Oh, here's two files with the same hash:
If you want to use those for rebuttal, check out the article here and download original copies of the images. There are other examples of files that have the same hashes. Any digital forensics expert who defends MD5 by trying to quote the "one in 340 bajillion" odds should be asked to explain how these files have the same hash and why cryptographers recommend that we do not use MD5. Any expert who would defend MD5 in that way probably doesn't have a good answer.
Will using MD5 to authenticate evidence cause it to be rejected? Almost certainly not. In most cases, hashing isn't required anyway. Will it embarrass an expert who gets called out for using it? I hope so.