onepass deterministic encryption
encryption algorithms can have some interesting properties:

onepass:
the algorithm goes over the plaintext only once;
for example, aes256ctr with a random iv;

deterministic:
the algorithm encrypts same plaintext into same ciphertext;
for example, aes256ctr with a synthetic iv;
this is a twopass algorithm; in the first pass, calculate a mac out of plaintext; in the second pass, the mac is used as iv in aes encryption;
a natural question: is it possible to combine these two properties?
the problem
to rephrase the problem: is there a secure onepass deterministic encryption algorithm?
to understand why this problem is meaningful, imagine you have a small router with limited storage but great network connections; you probably want to use it to do something fancier than relaying ip packets; how about encrypting archives? looks like a great job! there may be multiple routers doing the same job; so you want the encryption to be deterministic, which makes dup detection much easier;
to formalize the problem: we have a router with memory size n
, which is
smaller than the plaintext size N
; we want to design a secure onepass
deterministic algorithm to encrypt plaintext on this router; the algorithm works
by feeding 1 bit of plaintext at a time, terminated by a EOF
character; before
EOF
, the router is free to output or keep silent; on EOF
, the router needs
to flush its output, generating the full ciphertext;
the requirement of small memory is sound; if the router has as much memory as plaintext size, then it could simply cache plaintext and access whenever it needs to; that blurs the boundary between onepass and multipass algorithms;
the answer
conclusion first: there is no secure onepass deterministic encryption algorithm;
the intuition: since the router has little memory, when fed with a large volume of input, will have to output; the leading bits in output do not depend on the trailing bits in input, because those bits havent been read yet; this correlates ciphertext prefix to plaintext prefix; thus, an adversary may discover something about plaintext when looking at ciphertext;
the proof
we give a proof that requires the encryption to be nontrivial, in that the first
bit of ciphertext over all plaintexts needs to be nonnegligibly variable (not a
constant but distributed between 0
and 1
);
it is easy to see the smaller the router memory, the easier the proof; thus we
prove for the hardest case where router memory is linear of input, but smaller
by a constant factor: N = k * n
; for example, N = 100 * n
;

there are certain
n+1
bit inputs that make router output something;due to limited memory size
n
, router can have at most2^n
states if it keeps silent; if there are more than2^n
n+1
bit inputs that keep router silent, then there will be at least two different inputs leaving router in the same state; once they are in the same state, they are always in the same state if we feed the same input to both; eventually the encryption generates the same ciphertext for two different plaintexts, which is invalid;this means there can be at most
2^n
n+1
bit inputs that keep router silent; there are totally2^(n+1)
differentn+1
bit inputs; this means if we randomly pick an+1
bit input, we have at most1/2
probability of keeping router silent; 
same as above, but for
2n
bit inputs;router still can have at most
2^n
states; so we know if we randomly pick a2n
bit input, we have at most(2^n) / (2^(2n)) = 1/(2^n)
probability of keeping router silent; this is negligible;put it another way, we can almost always expect some output after feeding
2n
bit input; 
the cpa attack:

generate a random
100n
bit plaintextm
and send it to router; wlog, suppose its ciphertextc
begins with0
; 
let
m_0 = m  0
; note thatm_0
andm
have the same2n
bit prefix; 
generate another random
100n
bit plaintextm_1
; 
send
m_0
andm_1
to router; get ciphertextc_b
; 
guess
b = 0
ifc_b
begins with0
,b = 1
otherwise;
probability of guessing right:

because
m_0
andm
have the same2n
bit prefix, and encryption ofm
outputs the first bit0
during the2n
bit prefix (beforeEOF
), we knowc_0
andc
both begin with0
; 
if
c_1
begins with1
, we always guess right no matter whatb
is; 
if
c_1
begins with0
, we always guessb = 0
; sinceb
is uniformly random chosen, we guess right with probability1/2
; 
let
p
be the probabilityc_1
begins with1
; the probability of guessing right is1 * p + 1/2 * (1  p) = (1 + p) / 2
; as long asp
is nonnegligible, this is nonnegligibly higher than1/2
;
