Subject Re: [Firebird-Architect] UTF-8 and Compression
Author Ann W. Harrison
Arno Brinkman wrote:
> The "new" compression is done on the whole record right? Wouldn't
> this has as side-affect that a record-fragment becomes much
> bigger (compared to now) if i change only 1 field-value from a record?

I'm having one of those days where my brain could easily be on spin
cycle, but I don't follow you. There are three things going around,
record compression, fragmentation, and difference records.

Working backward, a difference record (aka delta) is a formula,
which, when applied to a fully expanded primary record, produces the
previous version of the record. If you change the value of an integer
in the record from 11 to 12, the delta contains something like 24, 1, 11
where 24 is the offset of the byte that changed, 1 is the length of the
change, and 11 is the old value. So, as far as I see, compression
algorithms have very little to do with deltas.

Record fragments are what happens when a record, when compressed,
doesn't fit on the page where it belongs. Maybe it doesn't fit on any
page. Maybe it is a new version that doesn't compress as well as the
original. In that case, the part that fits goes on the target page with
a pointer in the record header to the page and index offset of tail of
the record. The data is compressed before fragmentation... So
different compression algorithms don't seem to make much difference.

Record compression is what happens to a record between the time its
sitting fat and happy, fully expanded in its record buffer and the
moment when it gets sent to disk. Currently we do a one byte run length
compression, front to back, starting with the first byte after the
record header. A more expensive compression algorithm would reduce the
amount of data stored except in some pathological cases.

Or maybe you meant something completely different?