Subject RE: [firebird-support] Re: detect duplicate blobs, how to?
Author Leyne, Sean
Richard,

> Sean
> Even SHA256 can’t eliminate all possibility of a duplicate. If you have files of
> more than 256 bits in them, by the pigeon hole principle, there WILL be
> duplicates within the universe of all possible files. There HAS to be. The
> probability is very low (but not zero) if you keep you set of files below 2^128
> members, but it is NOT 0.

I agree that there is a possibility, but it is all about understanding scale.

Consider, 2^128 is the same number space that is used to represent UUID and GUID values, and the changes of a collision is 2.71 x 10^18 (https://en.wikipedia.org/wiki/Universally_unique_identifier )

Consider, the number of words ever spoken by human beings = 5 Exabytes (< 2^63) (http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html), so what are the chances that more than 2^128 files have been created?

Consider, the probability that a rogue asteroid crashes on Earth *within the next second*, obliterating civilization-as-we-know-it is about 10^-15. Which is 45 **orders of magnitude** more probable than the SHA-256 collision. (http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice)

IMO, the changes of a collision are for all practical purposes 0.

But if you still think there is a chance, then use SHA512.


Sean