minimum edit distance

minimum edit distance is a very classic problem that often shows up in undergrad courses and job interviews (dont worry: they only ask this question in its simplest form, afaik); the solution to this problem is given by a dynamic programming algorithm called Wagner–Fischer algorithm;

the implementation of this algorithm is trivial; for example, this piece of code will pass the test (albeit slow):

import numpy as np
class Solution:
    def minDistance(self, word1: str, word2: str) -> int:
        m, n = len(word1), len(word2)
        dp = np.zeros((m + 1 ,n + 1), dtype=int)
        for i in range(m + 1):
            for j in range(n + 1):
                if i == 0:
                    dp[i][j] = j
                    continue
                if j == 0:
                    dp[i][j] = i
                    continue
                dp[i][j] = min(
                    dp[i - 1][j] + 1,
                    dp[i][j - 1] + 1,
                    dp[i - 1][j - 1] + int(word1[i - 1] != word2[j - 1]),
                )
        return dp[m][n]

more complicated is to prove this algorithm is correct;

trivial facts

there are 3 permitted operations: insertion, substitution, deletion; a quick thought should give this fact: in an optimum solution, a single character should ever be operated at most once; that is, we do not insert then delete, insert then substitute, substitute then substitute, or substitute then delete the same character; a character is either never operated, or inserted, substituted, deleted only once;

another quick thought should give this fact: when either string is empty, the result is simply the length of the other string; this narrows our proof to the induction step only;

proof in the alignment perspective

it is easier to prove if we see this editing problem as an alignment problem; the alignment is character-wise, between the old string and the new string, each having 1 of 4 alignment types:

an inserted character is aligned from a gap;
a substituted character is aligned to its substitution;
a deleted character is aligned to a gap;
an unoperated character is aligned to the same character;

for example:

inte-ntion
||||||||||
3224124444 (alignment types)
||||||||||
-execution

it should be easy to establish this fact: an optimum editing sequence corresponds to an optimum alignment with the same cost, and vice versa; that is, given an optimum editing sequence, we can use it to derive an optimum alignment with the same cost, and vice versa;

going from an alignment to an editing sequence is trivial; to see the other direction, we can reorder the operations: first consider substitutions and unoperations and we know they preserve partial order; then add gaps in old string for insertions; then add gaps in new string for deletions; now we have an alignment with the same cost as the editing sequence;

now we can prove the correctness of the algorithm using alignments; remember we only need to prove the induction step; in each induction step, look at the alignment of the last character and we know there are 4 possibilities at all, which are the 4 alignment types as listed above; the algorithm correctly handles all these cases and so will give the optimum alignment; we know the optimum alignment gives the optimum editing sequence from what we have established;

proof in the editing perspective

we can also prove in the editing perspective; it may be helpful if we use some colors to augment the editing; that is, we color the character being operated in each step; remember that any character is operated at most once, so it will not have 2 or more colors; the color scheme is:

an inserted character is colored red;
a substituted character is colored green;
a deleted character is colored blue;
an unoperated character is colored white;

for example:

inte-ntion
||||||||||
bggwrgwwww (colors)
||||||||||
-execution

when the editing ends, every character (including deleted characters) has a color; we read this color sequence (bggwrgwwww) backwards; in each induction step, there are 4 possibilities, which are all covered by our induction, and so our algorithm is guaranteed to find this optimum solution;

note that here the induction is performed on colors, not on editing operations in sequence order;

this proof does not use alignment at all, but is in essential the same as the alignment-based proof; in fact you can easily notice the bijection between alignment types and colors:

3224124444 (alignment types)
bggwrgwwww (colors)