Subject Block Encryption, Initialization Vector, and Security
Author Jim Starkey
I'd like to suggest a short diversion into cryptology with regards to
ECB (electronic code book), CBC (cipher block chaining), initialization
vectors, Firebird, and security. It's not important, but it is interesting.

There are two ways to analyze a crypto-system. One is to analyze
possible attacks attempting to reconstruct the encryption key. The
other is "information leakage", where information is deduced from the
the cryptotext alone.

As discussed earlier, ECB encrypts each 16 byte segment of plaintext
independently, which CBC XORs the previous block of cryptotext with the
next block of plaintext. To get things going, CBC XORs the first block
with an initialization vector. The advantage of CBC is not that it
makes the encryption harder to break -- it doesn't. What it does do is
prevent information leakage by ensuring that identical blocks of
plaintext are not represented as recognizably identical blocks of
cryptotext.

Wikipedia gives a graphic, though slightly disingenuous, example of the
difference, showing that an image of the Linux penguin encrypted with
ECB still looks like the Linux penguin, albeit fuzzy and gray scale
rather than color. This is because every block of uniform code (it's a
four color image) maps to the same cryptotext. The areas bordering
color changes are all noise. This phenomenon is valid if the original
image with an uncompressed bitmap. If the original image where jpeg or
gif, for example, the compression artifacts would destroy the simple
mapping of solid color to solid color.

A Firebird page encryption has greater commonality with a jpeg image
than a bitmap. Nothing except the page header (which is boring) is
aligned and SQZ eliminates repeating values. Even if encoded with ECB
(which I do no advocate), the potential information leakage is
essentially zip. If, for example, you were trying to find instances of
the string "Starkey". The string can fall anywhere within two
consecutive blocks (32 bytes assuming AES), where the remaining 25 byte
(24 if you discount the run length encoding byte) are extremely unlikely
to be constant. This means that there are 2^192 possible blocks of
adjacent cryptotext that can contain "Starkey". And, since there is no
way of knowing where on the page a record might be, you wouldn't even
know where to start. It certainly would be possible to find an analyze
TIPs and PIPs, but that's about it. This isn't an argument for ECB,
just that CBC doesn't really buy anything. But since CBC is dirty
cheap, there's no down side either.

This brings us to the CBC initialization vector. Does it matter for
page encryption? No, not a bit. It doesn't increase the strength of
the cipher, so it isn't really part of the crypto-system and isn't part
of the "shared secret", which is the foundation all cryptology. Its
purpose is to guard again information leakage. However, the variability
in the page header does the same thing. So there is no point is losing
sleep trying to be clever about the initialization vector.

The line protocol is a different matter entirely. The packets are more
deterministic and have less infrastructure than data pages, aren't
compressed (last time I looked, at least). Carefully analyzed, it is
distinctly possible that useful information might leak. Here, CBC is
important. But is the initialization vector significant? If there were
any chance that two sessions would play out identically, perhaps. But
the nature of the protocol makes this virtually impossible.


--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376