Subject | RE: [IB-Architect] Next ODS change (was: System table change) |
---|---|
Author | Ann Harrison |
Post date | 2000-11-07T21:55:33Z |
At 11:48 AM 11/7/2000 -0700, David Berg wrote:
for each page immediately before writing and immediately after reading
took 10% of the server CPU time. Since partial writes are very rare,
they stopped checksumming pages. I think about that every time I
starting thinking about elaborate compression or encryption schemes.
be added at any time.
cheap to compute. It uses a signed byte to indicate length. A positive
value of <n> shows that the next <n> characters are part of the record.
A negative value of <n> shows that the next character (byte value) should
be repeated <n> times.
If the length indicator were a word, an empty varchar(32000) would be
stored as three bytes, rather than the current 300 or so. The algorithm
might use the high order bit to indicate whether the indicator was one
byte or two and the next bit to indicate a run of some character. Slightly
less good for long empty strings, better for short empty strings.
key incarnation - not as they appear in the row) sort correctly byte-wise.
Prefix compression works exceedingly well for index keys ... Check the
stats on almost any index - the average key size will be much shorter than
the declared key length.
My thoughts, anyway.
Regards,
Ann
>On the subject of compression algorythms, have you considered dictionaryCharlie told me that measurements showed that computing a checksum
>based schemes?
for each page immediately before writing and immediately after reading
took 10% of the server CPU time. Since partial writes are very rare,
they stopped checksumming pages. I think about that every time I
starting thinking about elaborate compression or encryption schemes.
>First, it would be particularly particularly valuable in dealing with largeAs a blob filter, dictionary-based compression would be great and can
>text objects, such as XML and HTML pages (which have lots of repetition).
be added at any time.
>Next down, dictionary compression of records would allow for commonWhat InterBase currently uses is a run-length compression which is
>repeating patterns to be compressed together. For example, if most men pay
>with a visa card, and the fields SEX, CHARGE TYPE, and CARD# are in
>sequence, then a dictionary might treat "MVISA 4" as a single dictionary
>token (4 is the first digit of all Visa cards). Likewise women who pay cash
>would compress "FCASH " as a single token. (No sexual
>stereotypes intended)
cheap to compute. It uses a signed byte to indicate length. A positive
value of <n> shows that the next <n> characters are part of the record.
A negative value of <n> shows that the next character (byte value) should
be repeated <n> times.
If the length indicator were a word, an empty varchar(32000) would be
stored as three bytes, rather than the current 300 or so. The algorithm
might use the high order bit to indicate whether the indicator was one
byte or two and the next bit to indicate a run of some character. Slightly
less good for long empty strings, better for short empty strings.
>For example a field declared as "Char(20) Dictionary(8)" would appear to theInteresting.
>outside world as a 20 character Alpha, but the internal storage would be an
>8 bit reference to a dictionary (thus allowing up to 255 unique 20 character
>values in the field). (You can actually think of this as two tables with an
>implied updatable join, although I suspect the performance would be faster
>using special purpose dictionary code.)
>One of the particularly neat things about field based dictionaries (andUmmm. Right. It is significantly important that index keys (in their
>record based compression in general) is that they can compress keys just as
>effectively. And the more keys stored on a single memory page, the faster
>the lookup.
key incarnation - not as they appear in the row) sort correctly byte-wise.
Prefix compression works exceedingly well for index keys ... Check the
stats on almost any index - the average key size will be much shorter than
the declared key length.
My thoughts, anyway.
Regards,
Ann