A quine is a program which outputs itself when it runs. Recursion Theorem tells you how to write a quine for a Turing machine.But you may want to implement one using the programming languages we actually use every day. Here’s how to do it.

First let’s describe the quine in a generic way. (You can skip this if you already know it.) From Recursion Theorem, a quine can be broken into two parts: A and B. Part B’s rule is to first do a computation on the given input, then print the computation result followed by the given input. For now, you don’t need to wonder where to get the input.

The computation can be roughly defined as:

Given an input w, compute the program that can output w.

Although there may be many ways to do this, we make an agreement that we always use a fixed way consistently.

Note that B only depends on the computation. You can write the program B from the rule given above, without knowing A.

Now let’s come to Part A. A’s rule is easy. A just outputs B using the fixed way. Note that A depends on B. But since you’ve already written out B, you can write out A now.

The whole quine Q is AB, which means first A runs then B runs. When A finishes, it will outputs B. When B starts, it will see the input, which is B itself. Then B do the above computation to figure out the program that outputs B. Since we always use a fixed way, the computation result will be exactly A . Therefore, according to B’s rule, B just prints A and B, which is Q when concatenated! Q prints Q!

It’s time to implement it using a real programming language. In this example we use C. Note that we use two different terms output and print, which is not differed in the Turing machine description. They are differed here because we actually write to different places for the two terms. For output, we write to a string (char * in C). For print, we write to console.

Thus B will get its input from a char * variable. Denote this variable by s. A quick thought should give you the implementation of B:

printf("void main() { char *s=\"%s\"; %s }",s,s);

Here the fixed way to output a string means to always assign it to a char * variable s. Since A just outputs B, the program for A can be formed by assigning the above program B to a char * variable s (actually A should do more work such as including void main and curly braces):

void main() { char *s="printf(\"void main() { char *s=\\\"%s\\\"; %s }\",s,s);"; }

Put A and B together, we get:

void main() { char *s="printf(\"void main() { char *s=\\\"%s\\\"; %s }\",s,s);"; printf("void main() { char *s=\"%s\"; %s }",s,s); }

Run it, we get:

void main() { char *s="printf("void main() { char *s=\"%s\"; %s }",s,s);"; printf("void main() { char *s=\"%s\"; %s }",s,s); }

It seems to work, but is affected by escapes. Where does the problem lie? The computation in B is wrong! Originally we think it’s nothing but wrapping it with quotes and prefixing the whole expression with a char *s=. Actually more work needs to be done: the escape. The computation should be done in the following steps:

  • Escape the input string
  • Wrap the escaped string with quotes
  • Prefix the wrapped string with char *s=

But how to implement the escape? If the language you use provides this feature, you’re lucky. But you can always write one yourself. I give a buggy version below (see the final quine), with potential buffer overflow and memory leak (which can be easily fixed).

Then how to make the whole program a quine with this additional function? The trick is that you can treat it as part of A (and part of the computation). This means you add a new step at the beginning of the computation, which is

  • Define a function escape

The behavior of A (and the computation) doesn’t change with this additional step. But you need to rewrite B according to this new computation since the preamble changes. Then make A from the new B. Finally you combine them together to make a quine Q. The final version of Q is given below:

char * escape(char *s) { char *t = malloc(1024); int i = 0,j = 0; for (;i < strlen(s);i++) { if (s[i] == '\\' || s[i] == '\'' || s[i] == '\"') t[j++] = '\\'; t[j++] = s[i]; } t[j] = 0; return t; } void main() { char *s="printf(\"char * escape(char *s) { char *t = malloc(1024); int i = 0,j = 0; for (;i < strlen(s);i++) { if (s[i] == \\\'\\\\\\\\\\\' || s[i] == \\\'\\\\\\\'\\\' || s[i] == \\\'\\\\\\\"\\\') t[j++] = \\\'\\\\\\\\\\\'; t[j++] = s[i]; } t[j] = 0; return t; } void main() { char *s=\\\"%s\\\"; %s }\",escape(s),s);"; printf("char * escape(char *s) { char *t = malloc(1024); int i = 0,j = 0; for (;i < strlen(s);i++) { if (s[i] == \'\\\\\' || s[i] == \'\\\'\' || s[i] == \'\\\"\') t[j++] = \'\\\\\'; t[j++] = s[i]; } t[j] = 0; return t; } void main() { char *s=\"%s\"; %s }",escape(s),s); }

This is exactly the program that will prints itself when it runs. But it is not easy to understand, so I provide the same program with proper line breaks and indentations:

char * escape(char *s)
{
    char *t = malloc(1024);
    int i = 0,j = 0; for (;i < strlen(s);i++) {
        if (s[i] == '\\' || s[i] == '\'' || s[i] == '\"')
            t[j++] = '\\';
        t[j++] = s[i];
    }
    t[j] = 0;
    return t;
}

void main()
{
    char *s="printf(\"char * escape(char *s) { char *t = malloc(1024); int i = 0,j = 0; for (;i < strlen(s);i++) { if (s[i] == \\\'\\\\\\\\\\\' || s[i] == \\\'\\\\\\\'\\\' || s[i] == \\\'\\\\\\\"\\\') t[j++] = \\\'\\\\\\\\\\\'; t[j++] = s[i]; } t[j] = 0; return t; } void main() { char *s=\\\"%s\\\"; %s }\",escape(s),s);";
    printf("char * escape(char *s) { char *t = malloc(1024); int i = 0,j = 0; for (;i < strlen(s);i++) { if (s[i] == \'\\\\\' || s[i] == \'\\\'\' || s[i] == \'\\\"\') t[j++] = \'\\\\\'; t[j++] = s[i]; } t[j] = 0; return t; } void main() { char *s=\"%s\"; %s }",escape(s),s);
}

Don’t be misled by a lot of backslashes, there’re just two rounds of escaping, first from the preamble to B, second from B to A.

Finally, the methods for making a quine can be summarized as three steps:

Figure out the computation.

Make B with the computation built in.

Make A, which just outputs B.

Quiz.

  1. Fix the memory leak.
  2. Fix the buffer overflow.
  3. gcc will give warnings due to some not included headers. Fix it.

(Hint: Add stuff to A.)