Subject Re: [Firebird-Architect] Re: Record Encoding
Author Jim Starkey
Roman Rokytskyy wrote:

>But you propose to change the storage format so that it decreases
>performance for my application. I object. :)
Hey, I haven't proposed changing Firebird. At least not yet. I am
giving it deep and serious consideration in my other life.

But before you get too upset, let's figure out the cost first. ZLib
encoding allows lots of tradeoffs, including a choice of static or
dynamic Huffman encoding and backward references for repeating
sequences. The tradeoffs are mostly on the encoding side. Decoding is
more or less linear.

At the moment Firebird is compute bound contemplating its navel and
doing silly unnecessary things like checking to see if a static
structure has changed since in the last microsecond. When we clean that
up, it will go back to being disk bound. At that point, you'd be
delighted to trade less time in the idle loop for fewer page reads and a
higher cache efficiency. That's when more sophisticated compression
makes sense.

But you have to admit that for records, an encoding scheme better than
the current compression is compelling.

You're a Java guy with builtin zip support. Why don't you try inflating
and deflating your favorite 20,000 blobs and get a handle on the costs
of of compression and decompression and some idea of the efficiency.
The Sun zip classes are designed to interface directly with zlib, so the
numbers should reasonably reflect the likely costs. I was utterly blown
away when I actually timed DES encryptions on modern machines. If the
cpu cycles required to inflate a blob are less than the cycles to fetch
the additional pages necessary for uncompressed storage, you'll have to
agree that it's really a performance enhancer.

Historical note: I was working for DEC's disk engineering group while
productizing the first JRD. They sold disks and didn't want to hear
anything about compression. I had to sell them compression as a
performance feature. Worked them. Maybe it will work with Roman.

>Also fetching all blobs into memory is not so easy. I do not know, if
>you're aware of it, but Sun JDK 1.4.x JVM can reference less than 1,5
>GB RAM (if I'm not wrong, that is only 1 GB). My application needs
>around 800 MB to support 150 simultaneous users, so for my blobs only
>200 MB left. Sorry, that is not too much.
I suspect you're wrong. They probably use a 32 bit object space, but
the objects themselves are outside the object space. They may be doing
something they consider clever in garbage collecting by having two
heaps. In any case, you can always spend an extra $49.95 and buy a 64
bit machine.

>Fetch 2 MB blob into memory in order to get 4k block from it? Not very
>efficient, isn't it?
If one blob fetch out of 100,000 is for an interior segment, yes it is.
Almost nobody does that sort of thing for a number of reasons, the most
important of which is that it performs like a pig across a wire.

>As to the type system, JDBC distinguishes BLOB, CLOB and BINARY data
>types. I have no problem of having BLOB compressed, CLOB containing
>charset info, but please give me BINARY with no compression, just as
>it is implemented at present time.
If you can't tell the difference, why should you care?


Jim Starkey
Netfrastructure, Inc.
978 526-1376