Subject | Re: [Firebird-Architect] Record Encoding |
---|---|
Author | Arno Brinkman |
Post date | 2005-05-12T19:05:06Z |
Hi,
bytes). Just to be sure i had understand your encoding proposal, a value with datatype "64-bit
integer" which holds the decimal value "100" takes up only two bytes (1 type and 1 for data).
same as RLE). When a compression is done over a whole record the delta-versions probably grow
compared with the current sizes. Anyway i think that compression for blob is definitly needed.
Regards,
Arno Brinkman
ABVisie
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/
Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com
Nederlandse firebird nieuwsgroep :
news://newsgroups.firebirdsql.info
> As an experiment, I tried encoding all records (excluding blobs) in oneWhy two? I think this won't help much for many data.
> of my production databases. The current record structure is similar to
> Firebird, though run length encoding is based on two byte units rather
> than one.
> The database may be more textual than some, but with a goodSounds interesting, thus the encoding data has already more compression as the regular RLE (two
> sprinkling of database. Virtual all primary and foreign keys are 32
> integers generated by sequences. There are no wildly overspecified
> fields. Since Netfrastructure has fewer semantic differences between
> blobs and text, the scheme is probably slightly blob intensive than a
> Firebird database. A significant difference in data demographics,
> however, is that fixed length strings are all but unheard of in
> Netfrastructure.
>
> Number records: 676,643
> Current compressed size (on disk): 74,793,858
> Encoded size (on disk): 46,342,823
> Current decompressed size (in memory): 206,762,788
> Encoded size (in memory): 58,663,007
bytes). Just to be sure i had understand your encoding proposal, a value with datatype "64-bit
integer" which holds the decimal value "100" takes up only two bytes (1 type and 1 for data).
> The difference between on-disk and in-memory encoded sizes is a vectorAnd this depends on the fields read from the record, because the offsets are set based on need.
> of 16 bit words containing known offsets of physical fields within the
> record encoding.
> Run length encoded on top of data stream encoding looks like a waste ofIn fact the encoding is taking away the repeating values :)
> time. Other than trailing blanks in fixed length streams and
> significant runs of null, nothing is likely to repeat.
> I do think thatThe second advantage i see with your record encoding is that delta-versions also keep small (the
> if an appropriate scheme can be found, additional on-disk compression
> would be a benefit, especially for character sets that map into
> multi-byte utf-8 characters. I'm taking another look at rfc 1951
> (DEFLATE) compression to see if it or a variant might do the trick.
same as RLE). When a compression is done over a whole record the delta-versions probably grow
compared with the current sizes. Anyway i think that compression for blob is definitly needed.
Regards,
Arno Brinkman
ABVisie
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/
Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com
Nederlandse firebird nieuwsgroep :
news://newsgroups.firebirdsql.info