Subject | RE: [ib-support] Re: Geographic data |
---|---|
Author | Alan McDonald |
Post date | 2002-10-07T22:24:36Z |
Ann,
can you explain what SET STATISTICS and this computed value does? How does
it speed up selects?
Alan
-----Original Message-----
From: Ann W. Harrison [mailto:aharrison@...]
Sent: Tuesday, 8 October 2002 7:42
To: ib-support@yahoogroups.com; ib-support@yahoogroups.com
Subject: Re: [ib-support] Re: Geographic data
At 09:48 PM 10/3/2002 +0000, danielberstein wrote:
numbers are stored in eight bytes, which limits the amount of
compression that can happen. Alphanumerics tend to be longer...
Here's more detail on what's happening...
When Firebird gets
a request to store something in an index or looks something
up in an index, it converts the "something" to a key. Keys
are arrays of bytes that can be compared using signed byte
comparisons without any consideration of data type.
The conversion works like this.
If the original datatype was "numeric" and between 16 and
18 digits of precision, it is represented as a 64 bit integer.
Other numerics, dates, time, and timestamps are represented as
double precision floating point. The floating point value is
reconstructed so it compares bytewise - moving the sign and
rounding up the exponent for IEEE float.
Character strings are expanded following their collation rules.
Once each segment of the key has been converted to a string
of bytes that compare numerically, trailing blanks are deleted
from strings and trailing zeros are deleted from numerics.
If the key is a single column, that's all that happens before
the key is stored.
If the key is multi-column, each segment is padded out to a
multiple of 4. At that point, the datatype doesn't matter
at all any more.
After every 4 bytes, Firebird adds a marker
every fifth byte. The marker is a byte indicating which
segment the value belongs to. For example, consider a three
column key representing first name, middle name, and last
name. Suppose
the key value was a name:
"HARRISON ", "ANN ", "E ".
Each part would have its trailing blanks suppressed and
be padded out to a multiple of 4
"HARRISON", "ANN ", "E "
Then the segment marker bytes are introduced.
"HARR0ISON0", "ANN 1", "E 2".
Then the results are concatenated:
"HARR0ISON0ANN 1E 2".
Without all that folderol, truncating trailing spaces would
result in confusion...
"HARRISON ", "ANNE ", " "
and
"HARRISON ", "ANN ", "E "
would be identical. The segment count eliminates that problem -
"HARR0ISON0ANN 1E 2". = Harrison, Ann, E
"HARR0ISON0ANNE1 2". = Harrison, Anne
Ok. That's what happens when a key is built. When it is stored,
it gets stuffed into a page (aka bucket) of index key and pointer
pairs. It goes between the next lower value and the next higher.
(No real surprise there). As it gets snuggled in, Firebird checks
to see how many leading bytes it has in common with the next lower
value. Take the two keys created above.
Assuming that the previous value was "IZETTA, GENE, N" the first
key/pointer pair would be stored as
0, 20, <pointer>, "HARR0ISON0ANN 1E 2"
The first 0 means that no prefix characters were suppressed. The
20 means that the key length is 20 bytes. The <pointer> is the
dbkey of actual record. And the quoted bit is the composed key.
The second would be stored as
13, 7, <pointer>, "E1 2"
The 13 indicates that the first 13 bytes are the same as the
previous index entry. The 7 means that there are seven bytes
of key value after the duplicate portion. "E1 2" are the
seven bytes.
if there were a duplicate HARRISON, ANNE, it would
be stored like this
20, 0, <pointer>, ""
Regards,
Ann
www.ibphoenix.com
We have answers.
To unsubscribe from this group, send an email to:
ib-support-unsubscribe@egroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
[Non-text portions of this message have been removed]
can you explain what SET STATISTICS and this computed value does? How does
it speed up selects?
Alan
-----Original Message-----
From: Ann W. Harrison [mailto:aharrison@...]
Sent: Tuesday, 8 October 2002 7:42
To: ib-support@yahoogroups.com; ib-support@yahoogroups.com
Subject: Re: [ib-support] Re: Geographic data
At 09:48 PM 10/3/2002 +0000, danielberstein wrote:
>So only alpanumerical columns are index-compressed? Did I missedErr, yes, I guess I wasn't very clear. The difference is that
>something?
numbers are stored in eight bytes, which limits the amount of
compression that can happen. Alphanumerics tend to be longer...
Here's more detail on what's happening...
When Firebird gets
a request to store something in an index or looks something
up in an index, it converts the "something" to a key. Keys
are arrays of bytes that can be compared using signed byte
comparisons without any consideration of data type.
The conversion works like this.
If the original datatype was "numeric" and between 16 and
18 digits of precision, it is represented as a 64 bit integer.
Other numerics, dates, time, and timestamps are represented as
double precision floating point. The floating point value is
reconstructed so it compares bytewise - moving the sign and
rounding up the exponent for IEEE float.
Character strings are expanded following their collation rules.
Once each segment of the key has been converted to a string
of bytes that compare numerically, trailing blanks are deleted
from strings and trailing zeros are deleted from numerics.
If the key is a single column, that's all that happens before
the key is stored.
If the key is multi-column, each segment is padded out to a
multiple of 4. At that point, the datatype doesn't matter
at all any more.
After every 4 bytes, Firebird adds a marker
every fifth byte. The marker is a byte indicating which
segment the value belongs to. For example, consider a three
column key representing first name, middle name, and last
name. Suppose
the key value was a name:
"HARRISON ", "ANN ", "E ".
Each part would have its trailing blanks suppressed and
be padded out to a multiple of 4
"HARRISON", "ANN ", "E "
Then the segment marker bytes are introduced.
"HARR0ISON0", "ANN 1", "E 2".
Then the results are concatenated:
"HARR0ISON0ANN 1E 2".
Without all that folderol, truncating trailing spaces would
result in confusion...
"HARRISON ", "ANNE ", " "
and
"HARRISON ", "ANN ", "E "
would be identical. The segment count eliminates that problem -
"HARR0ISON0ANN 1E 2". = Harrison, Ann, E
"HARR0ISON0ANNE1 2". = Harrison, Anne
Ok. That's what happens when a key is built. When it is stored,
it gets stuffed into a page (aka bucket) of index key and pointer
pairs. It goes between the next lower value and the next higher.
(No real surprise there). As it gets snuggled in, Firebird checks
to see how many leading bytes it has in common with the next lower
value. Take the two keys created above.
Assuming that the previous value was "IZETTA, GENE, N" the first
key/pointer pair would be stored as
0, 20, <pointer>, "HARR0ISON0ANN 1E 2"
The first 0 means that no prefix characters were suppressed. The
20 means that the key length is 20 bytes. The <pointer> is the
dbkey of actual record. And the quoted bit is the composed key.
The second would be stored as
13, 7, <pointer>, "E1 2"
The 13 indicates that the first 13 bytes are the same as the
previous index entry. The 7 means that there are seven bytes
of key value after the duplicate portion. "E1 2" are the
seven bytes.
if there were a duplicate HARRISON, ANNE, it would
be stored like this
20, 0, <pointer>, ""
Regards,
Ann
www.ibphoenix.com
We have answers.
To unsubscribe from this group, send an email to:
ib-support-unsubscribe@egroups.com
Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
[Non-text portions of this message have been removed]