Subject | Re: [Firebird-Architect] Re: Record Encoding |
---|---|
Author | Jim Starkey |
Post date | 2005-05-12T22:06:32Z |
Roman Rokytskyy wrote:
giving it deep and serious consideration in my other life.
But before you get too upset, let's figure out the cost first. ZLib
encoding allows lots of tradeoffs, including a choice of static or
dynamic Huffman encoding and backward references for repeating
sequences. The tradeoffs are mostly on the encoding side. Decoding is
more or less linear.
At the moment Firebird is compute bound contemplating its navel and
doing silly unnecessary things like checking to see if a static
structure has changed since in the last microsecond. When we clean that
up, it will go back to being disk bound. At that point, you'd be
delighted to trade less time in the idle loop for fewer page reads and a
higher cache efficiency. That's when more sophisticated compression
makes sense.
But you have to admit that for records, an encoding scheme better than
the current compression is compelling.
You're a Java guy with builtin zip support. Why don't you try inflating
and deflating your favorite 20,000 blobs and get a handle on the costs
of of compression and decompression and some idea of the efficiency.
The Sun zip classes are designed to interface directly with zlib, so the
numbers should reasonably reflect the likely costs. I was utterly blown
away when I actually timed DES encryptions on modern machines. If the
cpu cycles required to inflate a blob are less than the cycles to fetch
the additional pages necessary for uncompressed storage, you'll have to
agree that it's really a performance enhancer.
Historical note: I was working for DEC's disk engineering group while
productizing the first JRD. They sold disks and didn't want to hear
anything about compression. I had to sell them compression as a
performance feature. Worked them. Maybe it will work with Roman.
the objects themselves are outside the object space. They may be doing
something they consider clever in garbage collecting by having two
heaps. In any case, you can always spend an extra $49.95 and buy a 64
bit machine.
Almost nobody does that sort of thing for a number of reasons, the most
important of which is that it performs like a pig across a wire.
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
>But you propose to change the storage format so that it decreasesHey, I haven't proposed changing Firebird. At least not yet. I am
>performance for my application. I object. :)
>
>
giving it deep and serious consideration in my other life.
But before you get too upset, let's figure out the cost first. ZLib
encoding allows lots of tradeoffs, including a choice of static or
dynamic Huffman encoding and backward references for repeating
sequences. The tradeoffs are mostly on the encoding side. Decoding is
more or less linear.
At the moment Firebird is compute bound contemplating its navel and
doing silly unnecessary things like checking to see if a static
structure has changed since in the last microsecond. When we clean that
up, it will go back to being disk bound. At that point, you'd be
delighted to trade less time in the idle loop for fewer page reads and a
higher cache efficiency. That's when more sophisticated compression
makes sense.
But you have to admit that for records, an encoding scheme better than
the current compression is compelling.
You're a Java guy with builtin zip support. Why don't you try inflating
and deflating your favorite 20,000 blobs and get a handle on the costs
of of compression and decompression and some idea of the efficiency.
The Sun zip classes are designed to interface directly with zlib, so the
numbers should reasonably reflect the likely costs. I was utterly blown
away when I actually timed DES encryptions on modern machines. If the
cpu cycles required to inflate a blob are less than the cycles to fetch
the additional pages necessary for uncompressed storage, you'll have to
agree that it's really a performance enhancer.
Historical note: I was working for DEC's disk engineering group while
productizing the first JRD. They sold disks and didn't want to hear
anything about compression. I had to sell them compression as a
performance feature. Worked them. Maybe it will work with Roman.
>Also fetching all blobs into memory is not so easy. I do not know, ifI suspect you're wrong. They probably use a 32 bit object space, but
>you're aware of it, but Sun JDK 1.4.x JVM can reference less than 1,5
>GB RAM (if I'm not wrong, that is only 1 GB). My application needs
>around 800 MB to support 150 simultaneous users, so for my blobs only
>200 MB left. Sorry, that is not too much.
>
>
the objects themselves are outside the object space. They may be doing
something they consider clever in garbage collecting by having two
heaps. In any case, you can always spend an extra $49.95 and buy a 64
bit machine.
>If one blob fetch out of 100,000 is for an interior segment, yes it is.
>
>Fetch 2 MB blob into memory in order to get 4k block from it? Not very
>efficient, isn't it?
>
>
Almost nobody does that sort of thing for a number of reasons, the most
important of which is that it performs like a pig across a wire.
>As to the type system, JDBC distinguishes BLOB, CLOB and BINARY dataIf you can't tell the difference, why should you care?
>types. I have no problem of having BLOB compressed, CLOB containing
>charset info, but please give me BINARY with no compression, just as
>it is implemented at present time.
>
>
>
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376