Subject | RE: [IB-Architect] Next ODS change (was: System table change) |
---|---|
Author | David Berg |
Post date | 2000-11-07T18:48Z |
On the subject of compression algorythms, have you considered dictionary
based schemes? I've always thought that a database built around dictionary
based compression systems could store data in as little as 1 - 5% the space
of a normal database, thus increasing the amount of data that can be kept in
memory by a factor of 20 - 100x, with a corresponding increase in
performance (reduced disk access).
First, it would be particularly particularly valuable in dealing with large
text objects, such as XML and HTML pages (which have lots of repetition).
Next down, dictionary compression of records would allow for common
repeating patterns to be compressed together. For example, if most men pay
with a visa card, and the fields SEX, CHARGE TYPE, and CARD# are in
sequence, then a dictionary might treat "MVISA 4" as a single dictionary
token (4 is the first digit of all Visa cards). Likewise women who pay cash
would compress "FCASH " as a single token. (No sexual
stereotypes intended)
Dictionary compression of individual fields can be particularly efficient
when looking at text representations of enumerated values (e.g. references
to a lookup table).
In fact, you could go a step further and allow a field to be declared as
"dictionary based". This would mean that all values in the field were
actually dictionary references and thus avoid the need for any type of flag
in the field to indicate it's dictionary nature.
For example a field declared as "Char(20) Dictionary(8)" would appear to the
outside world as a 20 character Alpha, but the internal storage would be an
8 bit reference to a dictionary (thus allowing up to 255 unique 20 character
values in the field). (You can actually think of this as two tables with an
implied updatable join, although I suspect the performance would be faster
using special purpose dictionary code.)
One of the particularly neat things about field based dictionaries (and
record based compression in general) is that they can compress keys just as
effectively. And the more keys stored on a single memory page, the faster
the lookup.
-----Original Message-----
From: Ann Harrison [mailto:harrison@...]
Sent: Tuesday, November 07, 2000 9:44 AM
To: IB-Architect@egroups.com; 'IB-Architect@egroups.com'
Subject: Re: [IB-Architect] Next ODS change (was: System table change)
At 11:10 AM 11/7/2000 -0500, Leyne, Sean wrote:
Personally, I'd go for varchar[255].
Other things to consider for the ODS change include:
1) Drop page checksum
2) Add page generation at bottom of page for a consistency check.
3) Increase max index key size from 255 to 64K.
4) Promote dbKey in non-unique indexes to eliminate duplicates.
5) Add FK root page (like IRT) so defining foreign keys doesn't
require exclusive access.
6) Improve compression algorithm for large fields.
I'm sure there are more.
Regards,
Ann
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com
based schemes? I've always thought that a database built around dictionary
based compression systems could store data in as little as 1 - 5% the space
of a normal database, thus increasing the amount of data that can be kept in
memory by a factor of 20 - 100x, with a corresponding increase in
performance (reduced disk access).
First, it would be particularly particularly valuable in dealing with large
text objects, such as XML and HTML pages (which have lots of repetition).
Next down, dictionary compression of records would allow for common
repeating patterns to be compressed together. For example, if most men pay
with a visa card, and the fields SEX, CHARGE TYPE, and CARD# are in
sequence, then a dictionary might treat "MVISA 4" as a single dictionary
token (4 is the first digit of all Visa cards). Likewise women who pay cash
would compress "FCASH " as a single token. (No sexual
stereotypes intended)
Dictionary compression of individual fields can be particularly efficient
when looking at text representations of enumerated values (e.g. references
to a lookup table).
In fact, you could go a step further and allow a field to be declared as
"dictionary based". This would mean that all values in the field were
actually dictionary references and thus avoid the need for any type of flag
in the field to indicate it's dictionary nature.
For example a field declared as "Char(20) Dictionary(8)" would appear to the
outside world as a 20 character Alpha, but the internal storage would be an
8 bit reference to a dictionary (thus allowing up to 255 unique 20 character
values in the field). (You can actually think of this as two tables with an
implied updatable join, although I suspect the performance would be faster
using special purpose dictionary code.)
One of the particularly neat things about field based dictionaries (and
record based compression in general) is that they can compress keys just as
effectively. And the more keys stored on a single memory page, the faster
the lookup.
-----Original Message-----
From: Ann Harrison [mailto:harrison@...]
Sent: Tuesday, November 07, 2000 9:44 AM
To: IB-Architect@egroups.com; 'IB-Architect@egroups.com'
Subject: Re: [IB-Architect] Next ODS change (was: System table change)
At 11:10 AM 11/7/2000 -0500, Leyne, Sean wrote:
>On the subject of the next ODS change, what would it take to increaseSlightly messy, but probably a good chance for a general cleanup.
>the namespace size of the DB object from 31 to, say, 63 characters?
Personally, I'd go for varchar[255].
Other things to consider for the ODS change include:
1) Drop page checksum
2) Add page generation at bottom of page for a consistency check.
3) Increase max index key size from 255 to 64K.
4) Promote dbKey in non-unique indexes to eliminate duplicates.
5) Add FK root page (like IRT) so defining foreign keys doesn't
require exclusive access.
6) Improve compression algorithm for large fields.
I'm sure there are more.
Regards,
Ann
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com