Merkle tree

Merkle tree (also known as hash tree) is a data structure used for data verification and synchronization. Its is used in many applications, including Git, BitTorrent, ZFS, and, of course, in cryptocurrencies.

In that tree :

  • Merkle Root is the hash that can be used to verify the integrity of the data structure.
  • Branch node is an internal node of the tree, i.e. a node that does not belong to the base layer.
  • Leaf node is an external node of the tree, i.e. a node belonging to the base layer.
  • Dataset is the initial set of data blocks from which we build the Merkle tree, for example an array of byte values.
graph TD ABCD["Top hash
Hash (Hash(A) + Hash(B) + Hash(C) + Hash (D))"] ABCD --> AB["Hash AB
Hash(Hash(A) + Hash(B))"] ABCD --> CD["Hash CD
Hash(Hash(C) + Hash(D))"] AB --> A["Hash A
Hash(A)"] AB --> B["Hash B
Hash(B)"] CD --> C["Hash C
Hash(C)"] CD --> D["Hash D
Hash(D)"] A --> AData["Dataset
A"] B --> BData["Dataset
B"] C --> CData["Dataset
C"] D --> DData["Dataset
D"]

If you take a look at our example below :

  • A, B, C and D are the datasets.
  • Hash A, Hash B, Hash C and Hash D are the leaf nodes, and they are calculated by hashing the data existing in the corresponding dataset.
  • Hash AB and Hash CD are branch nodes, their hash is the hash of the concatenation of its child’s hash. For Hash AB, you calculate hash(A), then you can calculate hash(B), this gives you two strings, you concatenate them and hash the result.
  • In the top of a hash tree there is the Merkle Root (or top hash or root hash or master hash).

As you can guess, if you change any data, the top hash will be completely different! Integrity verification made easy !

Another great thing about Merkle tree is I can prove you that something is present without providing you the entire data or even the entire Merkle tree.

Let’s go back to our previous example. The person I want to prove something to only know the top hash value and I want to prove that the data in the C data block is “Hello world!”.

graph TD ABCD["Top hash
Hash (Hash(A) + Hash(B) + Hash(C) + Hash (D))"] ABCD --> AB["Hash AB
Hash(Hash(A) + Hash(B))
"] ABCD --> CD["Hash CD
Hash(Hash(C) + Hash(D))"] AB --> A["Hash A
Hash(A)"] AB --> B["Hash B
Hash(B)"] CD --> C["Hash C
Hash(C)"] CD --> D["Hash D
Hash(D)
"] A --> AData["Dataset
A"] B --> BData["Dataset
B"] C --> CData["Dataset
C"
] D --> DData["Dataset
D"]

Remember the person I want to convince only knows the Top hash, here are the only things I have to provide :

  • The value of C (“Hello world!”) so he can calculate Hash(C).
  • The value of Hash(D).
  • He can now calculate Hash CD as he has Hash(C) & Hash(D).
  • The value of Hash(AB) (which is Hash(A) + Hash(B)).
  • With the value of Hash AB I gave him and the value of Hash CD he calculated, he can calculate the Top hash !

If the Top Hash value he knew is equals to the value he just calculated with the data I provided, he can be sure that the value of C is indeed “hello world!”.

Merkle trees are a kind of cryptographic accumulator. Cryptographic accumulators allow you to compact arbitrarily many pieces of data into a constant amount of space. In other words, a Merkle tree lets us represent arbitrarily many blocks while only transmitting a single hash across the wire.