one-pass deterministic encryption

encryption algorithms can have some interesting properties:

one-pass:

the algorithm goes over the plaintext only once;

for example, aes-256-ctr with a random iv;
deterministic:

the algorithm encrypts same plaintext into same ciphertext;

for example, aes-256-ctr with a synthetic iv;

this is a two-pass algorithm; in the first pass, calculate a mac out of plaintext; in the second pass, the mac is used as iv in aes encryption;

a natural question: is it possible to combine these two properties?

the problem

to rephrase the problem: is there a secure one-pass deterministic encryption algorithm?

to understand why this problem is meaningful, imagine you have a small router with limited storage but great network connections; you probably want to use it to do something fancier than relaying ip packets; how about encrypting archives? looks like a great job! there may be multiple routers doing the same job; so you want the encryption to be deterministic, which makes dup detection much easier;

to formalize the problem: we have a router with memory size n, which is smaller than the plaintext size N; we want to design a secure one-pass deterministic algorithm to encrypt plaintext on this router; the algorithm works by feeding 1 bit of plaintext at a time, terminated by a EOF character; before EOF, the router is free to output or keep silent; on EOF, the router needs to flush its output, generating the full ciphertext;

the requirement of small memory is sound; if the router has as much memory as plaintext size, then it could simply cache plaintext and access whenever it needs to; that blurs the boundary between one-pass and multi-pass algorithms;

the answer

conclusion first: there is no secure one-pass deterministic encryption algorithm;

the intuition: since the router has little memory, when fed with a large volume of input, will have to output; the leading bits in output do not depend on the trailing bits in input, because those bits havent been read yet; this correlates ciphertext prefix to plaintext prefix; thus, an adversary may discover something about plaintext when looking at ciphertext;

the proof

we give a proof that requires the encryption to be nontrivial, in that the first bit of ciphertext over all plaintexts needs to be non-negligibly variable (not a constant but distributed between 0 and 1);

it is easy to see the smaller the router memory, the easier the proof; thus we prove for the hardest case where router memory is linear of input, but smaller by a constant factor: N = k * n; for example, N = 100 * n;

there are certain n+1-bit inputs that make router output something;

due to limited memory size n, router can have at most 2^n states if it keeps silent; if there are more than 2^n n+1-bit inputs that keep router silent, then there will be at least two different inputs leaving router in the same state; once they are in the same state, they are always in the same state if we feed the same input to both; eventually the encryption generates the same ciphertext for two different plaintexts, which is invalid;

this means there can be at most 2^n n+1-bit inputs that keep router silent; there are totally 2^(n+1) different n+1-bit inputs; this means if we randomly pick a n+1-bit input, we have at most 1/2 probability of keeping router silent;
same as above, but for 2n-bit inputs;

router still can have at most 2^n states; so we know if we randomly pick a 2n-bit input, we have at most (2^n) / (2^(2n)) = 1/(2^n) probability of keeping router silent; this is negligible;

put it another way, we can almost always expect some output after feeding 2n-bit input;
the cpa attack:
1. generate a random 100n-bit plaintext m and send it to router; wlog, suppose its ciphertext c begins with 0;
2. let m_0 = m || 0; note that m_0 and m have the same 2n-bit prefix;
3. generate another random 100n-bit plaintext m_1;
4. send m_0 and m_1 to router; get ciphertext c_b;
5. guess b = 0 if c_b begins with 0, b = 1 otherwise;
probability of guessing right:
- because m_0 and m have the same 2n-bit prefix, and encryption of m outputs the first bit 0 during the 2n-bit prefix (before EOF), we know c_0 and c both begin with 0;
- if c_1 begins with 1, we always guess right no matter what b is;
- if c_1 begins with 0, we always guess b = 0; since b is uniformly random chosen, we guess right with probability 1/2;
- let p be the probability c_1 begins with 1; the probability of guessing right is 1 * p + 1/2 * (1 - p) = (1 + p) / 2; as long as p is non-negligible, this is non-negligibly higher than 1/2;