Merkle-Damgård construction is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. This construction was used in the design of popular hash algorithms such as MD5, SHA1 and SHA2.

To make it easier to understand:

Merkle-Damgård constructs a variable-length collision resistant hash function $H$ from a fixed-length collision-resistant hash function $h$.

## Construction

Suppose:

• Block size is $l$.

• $h$ inputs length $2l$ and outputs length $l$.

• $H$ inputs $x$ whose length is $L$.

The construction is as follows:

1. Set block size $B = \lceil \frac{L}{l} \rceil$. Pad $x$ with $0$s so that its length is multiple of $l$. Now we have $l$-bit blocks $x_1, x_2, \dots, x_B$.

2. (Length Padding) Let $x_{B+1} = L$, where $L$ is encoded using exactly $l$ bits. Now we have $x_{B+1}$.

3. Let $z_0 = 0^l$. This is the initialization vector (IV).

4. For $i = 1, \dots, B+1$, compute $z_i = h(z_{i-1} \parallel x_i)$.

5. Output $z_{B+1}$.

A picture is more than a thousand words:

## Proof of security

Theorem: If $h$ is collision-resistant, then so is $H$.

Proof: We prove by contraposition. Suppose $H$ is not collision-resistant, which means we can easily find 2 different inputs $x \neq x'$ such that $H(x) = H(x')$. Let $x_1, \dots, x_{B+1}$ and $x'_1, \dots, x'_{B'+1}$ be the input blocks. Recall that $x_{B+1} = L$ and $x'_{B'+1} = L'$. There are 2 cases:

1. $L \neq L'$. In this case we only look at the last block:

Because $x_{B+1} = L \neq L' = x'_{B'+1}$, we have $z_B \parallel x_{B+1} \neq z'_{B'} \parallel x'_{B'+1}$. So we have found a collision for $h$.

2. $L = L'$. This implies $B = B'$. In this case we look backwards from the last block:

• We have $h(z_B \parallel x_{B+1} ) = z_{B+1} = H(x) = H(x') = z'_{B'+1} = h(z'_{B'} \parallel x'_{B'+1})$.

If $z_B \neq z'_{B'}$ or $x_{B+1} \neq x'_{B'+1}$, then we have found a collision for $h$. Else, we have $x_{B+1} = x'_{B'+1} = x'_{B+1}$ and:

• We have $h(z_{B-1} \parallel x_B) = z_B = z'_{B'} = h(z'_{B'-1} \parallel x'_{B'})$.

If $z_{B-1} \neq z'_{B'-1}$ or $x_B \neq x'_{B'}$, then we have found a collision for $h$. Else, we have $x_B = x'_{B'} = x'_B$ and:

• … Else, we have $x_1 = x'_1$. But if we get here, then $x_i = x'_i$ for all $i = 1, \dots, B+1$, which implies $x = x'$. This is a contradiction.