Subject | Re: Geographic data |
---|---|
Author | danielberstein |
Post date | 2002-10-08T13:11:23Z |
Thanks for that post Ann! Have you ever thought about writing a FB/IB
book? Last time I checked Amazon there were *no* IB (let alone FB)
books.
CYA.
book? Last time I checked Amazon there were *no* IB (let alone FB)
books.
CYA.
--- In ib-support@y..., "Ann W. Harrison" <aharrison@i...> wrote:
> At 09:48 PM 10/3/2002 +0000, danielberstein wrote:
>
> >So only alpanumerical columns are index-compressed? Did I missed
> >something?
>
> Err, yes, I guess I wasn't very clear. The difference is that
> numbers are stored in eight bytes, which limits the amount of
> compression that can happen. Alphanumerics tend to be longer...
>
> Here's more detail on what's happening...
>
> When Firebird gets
> a request to store something in an index or looks something
> up in an index, it converts the "something" to a key. Keys
> are arrays of bytes that can be compared using signed byte
> comparisons without any consideration of data type.
>
> The conversion works like this.
>
> If the original datatype was "numeric" and between 16 and
> 18 digits of precision, it is represented as a 64 bit integer.
>
> Other numerics, dates, time, and timestamps are represented as
> double precision floating point. The floating point value is
> reconstructed so it compares bytewise - moving the sign and
> rounding up the exponent for IEEE float.
>
> Character strings are expanded following their collation rules.
>
> Once each segment of the key has been converted to a string
> of bytes that compare numerically, trailing blanks are deleted
> from strings and trailing zeros are deleted from numerics.
> If the key is a single column, that's all that happens before
> the key is stored.
>
> If the key is multi-column, each segment is padded out to a
> multiple of 4. At that point, the datatype doesn't matter
> at all any more.
>
> After every 4 bytes, Firebird adds a marker
> every fifth byte. The marker is a byte indicating which
> segment the value belongs to. For example, consider a three
> column key representing first name, middle name, and last
> name. Suppose
> the key value was a name:
> "HARRISON ", "ANN ", "E ".
>
> Each part would have its trailing blanks suppressed and
> be padded out to a multiple of 4
>
> "HARRISON", "ANN ", "E "
>
> Then the segment marker bytes are introduced.
>
> "HARR0ISON0", "ANN 1", "E 2".
>
> Then the results are concatenated:
>
> "HARR0ISON0ANN 1E 2".
>
> Without all that folderol, truncating trailing spaces would
> result in confusion...
>
> "HARRISON ", "ANNE ", " "
> and
> "HARRISON ", "ANN ", "E "
>
> would be identical. The segment count eliminates that problem -
>
> "HARR0ISON0ANN 1E 2". = Harrison, Ann, E
> "HARR0ISON0ANNE1 2". = Harrison, Anne
>
> Ok. That's what happens when a key is built. When it is stored,
> it gets stuffed into a page (aka bucket) of index key and pointer
> pairs. It goes between the next lower value and the next higher.
> (No real surprise there). As it gets snuggled in, Firebird checks
> to see how many leading bytes it has in common with the next lower
> value. Take the two keys created above.
>
> Assuming that the previous value was "IZETTA, GENE, N" the first
> key/pointer pair would be stored as
>
> 0, 20, <pointer>, "HARR0ISON0ANN 1E 2"
>
> The first 0 means that no prefix characters were suppressed. The
> 20 means that the key length is 20 bytes. The <pointer> is the
> dbkey of actual record. And the quoted bit is the composed key.
>
> The second would be stored as
>
> 13, 7, <pointer>, "E1 2"
>
> The 13 indicates that the first 13 bytes are the same as the
> previous index entry. The 7 means that there are seven bytes
> of key value after the duplicate portion. "E1 2" are the
> seven bytes.
>
> if there were a duplicate HARRISON, ANNE, it would
> be stored like this
>
> 20, 0, <pointer>, ""
>
>
>
>
>
> Regards,
>
> Ann
> www.ibphoenix.com
> We have answers.