Hash functions

A hash function is function that can be used to transform data of arbitrary size to a unique string of bytes, it’s like being able to attribute a global unique identifiers to anything. This might seem like nothing, but this simple fabrication is extremely useful to build many things in cryptography and Bitcoin! To be more clear, a hash function takes an arbitrary-length input (a file, a message, a video, and so on) and produces a fixed-length output (for example, 256 bits for SHA-256). Hashing the same input produces the same digest or hash and hashing two similar texts gives two very different results.

The main property of a hash function is that one cannot revert the algorithm, meaning that one shouldn’t be able to find the input from just the output. Hash functions are one-way, if I write the sha-256 result of something, no one on this planet can calculate the source I used to get that result.

SHA-2 and SHA-3 are the two most widely adopted hash functions. SHA-2 is based on the Merkle–Damgård construction, while SHA-3 is based on the sponge construction. SHA-2 provides 4 different versions, producing outputs of 224, 256, 384 & 512 (256 is the most used). Here is a good article to know more about the mathematics behind SHA-256.

Let’s do sha256 on the word “foobar” with this command:

echo -n foobar | sha256sum

The result is : “c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2”.

Now, let’s calcule “fopbar” sha-256 value with:

echo -n fopbar | sha256sum

The result is : “04006a569077f11c3d1e5f3f5994e10a40d50fb3679ab89b053d1236024002be”.

As you can see, both results are very different!

The SHA1 hash function is now completely unsafe as researchers have achieved the first practical SHA-1 collision, generating two PDF files with the same signature.