Merkle–Damgård construction

Merkle-Damgård construction is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. This construction was used in the design of popular hash algorithms such as MD5, SHA1 and SHA2.

To make it easier to understand:

Merkle-Damgård constructs a variable-length collision resistant hash function $H$ from a fixed-length collision-resistant hash function $h$ .

Construction

Suppose:

Block size is $l$ .
$h$ inputs length $2l$ and outputs length $l$ .
$H$ inputs $x$ whose length is $L$ .

The construction is as follows:

Set block size $B = \lceil \frac{L}{l} \rceil$ . Pad $x$ with $0$ s so that its length is multiple of $l$ . Now we have $l$ -bit blocks $x_1, x_2, \dots, x_B$ .
(Length Padding) Let $x_{B+1} = L$ , where $L$ is encoded using exactly $l$ bits. Now we have $x_{B+1}$ .
Let $z_0 = 0^l$ . This is the initialization vector (IV).
For $i = 1, \dots, B+1$ , compute $z_i = h(z_{i-1} \parallel x_i)$ .
Output $z_{B+1}$ .

A picture is more than a thousand words:

Proof of security

Theorem: If $h$ is collision-resistant, then so is $H$ .

Proof: We prove by contraposition. Suppose $H$ is not collision-resistant, which means we can easily find 2 different inputs $x \neq x'$ such that $H(x) = H(x')$ . Let $x_1, \dots, x_{B+1}$ and $x'_1, \dots, x'_{B'+1}$ be the input blocks. Recall that $x_{B+1} = L$ and $x'_{B'+1} = L'$ . There are 2 cases:

$L \neq L'$ . In this case we only look at the last block:
$h(z_B \parallel x_{B+1}) = z_{B+1} = H(x) = H(x') = z'_{B'+1} = h(z'_{B'} \parallel x'_{B'+1})$
Because $x_{B+1} = L \neq L' = x'_{B'+1}$ , we have $z_B \parallel x_{B+1} \neq z'_{B'} \parallel x'_{B'+1}$ . So we have found a collision for $h$ .
$L = L'$ . This implies $B = B'$ . In this case we look backwards from the last block:
- We have $h(z_B \parallel x_{B+1} ) = z_{B+1} = H(x) = H(x') = z'_{B'+1} = h(z'_{B'} \parallel x'_{B'+1})$ .
  
  If $z_B \neq z'_{B'}$ or $x_{B+1} \neq x'_{B'+1}$ , then we have found a collision for $h$ . Else, we have $x_{B+1} = x'_{B'+1} = x'_{B+1}$ and:
- We have $h(z_{B-1} \parallel x_B) = z_B = z'_{B'} = h(z'_{B'-1} \parallel x'_{B'})$ .
  
  If $z_{B-1} \neq z'_{B'-1}$ or $x_B \neq x'_{B'}$ , then we have found a collision for $h$ . Else, we have $x_B = x'_{B'} = x'_B$ and:
- …
- … Else, we have $x_1 = x'_1$ . But if we get here, then $x_i = x'_i$ for all $i = 1, \dots, B+1$ , which implies $x = x'$ . This is a contradiction.

MD-compliant padding

The padding scheme used in the Merkle–Damgård construction must be chosen carefully to ensure the security of the scheme. The above construction uses length padding, but there are other MD-compliant padding schemes as well. For details about MD-compliant padding, check Wikipedia.

References

Introduction to Modern Cryptography.