Subject | Re: [firebird-support] Re: CHAR Vs VARCHAR vs Smallint for storing int value from 0 to 255 |
---|---|
Author | Ann W. Harrison |
Post date | 2010-06-09T18:56:52Z |
svanderclock wrote:
buffer, not to individual fields.
as strings of bytes - all datatypes are manipulated into
a format that produces the right result when compared byte
by byte, after truncating non-significant bytes and sprinkling
in tag bytes to indicate breaks in between compound keys.
Starting with single key indexes, all numeric and
date types except bigint (64 bit integer) are converted to
double precision floating point. Because bigint is more
precise than double precision, it's handled as its own
type ... there's a proposal in architecture to extend
the double representation ... but that's outside this
discussion.
So your small-int field becomes an eight-byte double
precision number. Then the trailing zeros are removed
because they're not significant, so it's small again.
All string types are converted to a format that represents
the collation order - which often expands the size of the
key because some collations have three or even four levels
of significance. A CHAR(1) OCTET has only one level, so
it stays as a single byte.
Then you get the effect of prefix compression. The first
key on each index page is complete, but subsequent keys
do not include the leading bytes which are identical to
the prior key.
On the whole, I'd guess that you'll get denser indexes with
CHAR(1). But do check the results of gstat. In theory, the
theory and the measurement should match but...
Cheers,
Ann
>Yes, the run length compression is applied to the entire record
> so as i understand, the compression is done also between field ?
buffer, not to individual fields.
>Indexes are more complicated. All index keys are compared
> what about the index ? is their any difference for the index to be on a smallint or a char(1) field ?
>
as strings of bytes - all datatypes are manipulated into
a format that produces the right result when compared byte
by byte, after truncating non-significant bytes and sprinkling
in tag bytes to indicate breaks in between compound keys.
Starting with single key indexes, all numeric and
date types except bigint (64 bit integer) are converted to
double precision floating point. Because bigint is more
precise than double precision, it's handled as its own
type ... there's a proposal in architecture to extend
the double representation ... but that's outside this
discussion.
So your small-int field becomes an eight-byte double
precision number. Then the trailing zeros are removed
because they're not significant, so it's small again.
All string types are converted to a format that represents
the collation order - which often expands the size of the
key because some collations have three or even four levels
of significance. A CHAR(1) OCTET has only one level, so
it stays as a single byte.
Then you get the effect of prefix compression. The first
key on each index page is complete, but subsequent keys
do not include the leading bytes which are identical to
the prior key.
On the whole, I'd guess that you'll get denser indexes with
CHAR(1). But do check the results of gstat. In theory, the
theory and the measurement should match but...
Cheers,
Ann