Subject | Re: [firebird-support] Indexing and matching technology |
---|---|
Author | Ann W. Harrison |
Post date | 2005-02-22T16:38:14Z |
Kjell Rilbe wrote:
with the record, expand the record, perform the comparison on the field,
and move on.
For indexed criteria, it's a bit more complicated. Firebird uses only
one type of index - a b-tree variant which has some quirks in key
generation, compression, and retrieval. Terminal nodes in the b-tree
contain record pointers, not data.
Keys are built from the data values into byte strings that compare
naturally. The key generation code starts by converting all numeric
values and dates to double precision. The one exception is that int64
values remain as int64. The double precision number is manipulated to
compare bytewise.
The code then compresses suffixes of each segment of the key,
eliminating trailing zeros in numbers and characters of type octet, and
trailing spaces in other types.
For a multi-segment key the pieces are put together with check bytes
that allow varying length segments to compare correctly. Each segment
is padded to a multiple of 4 bytes, and after every four bytes a new
byte is introduced that holds the position of the current segment.
The code that stores a key on an index page performs prefix compression,
eliminated all the bytes that are the same as the previous key value on
the page.
Indexed retrieval is a three step process. First the system builds bit
maps of all index entries that meet the criteria in each step of the
restriction. The bit maps are ordered by data page. Then the bit maps
are combined with 'and' and 'or' operations as appropriate. Finally the
system retrieves the records indicated by the resultant bit map.
secret protection, the architecture list
(firebird-architect@yahoogroups.com) is the right place.
Regards,
Ann
> Hi,For non-indexed criteria, we do just what you'd expect - read the page
>
> Coming from a company (now defunct) that had a very specialized DB
> engine for extremely fast matching of thousands of criteria, I'd be
> interested to learn hos Firebird works in this regard
with the record, expand the record, perform the comparison on the field,
and move on.
For indexed criteria, it's a bit more complicated. Firebird uses only
one type of index - a b-tree variant which has some quirks in key
generation, compression, and retrieval. Terminal nodes in the b-tree
contain record pointers, not data.
Keys are built from the data values into byte strings that compare
naturally. The key generation code starts by converting all numeric
values and dates to double precision. The one exception is that int64
values remain as int64. The double precision number is manipulated to
compare bytewise.
The code then compresses suffixes of each segment of the key,
eliminating trailing zeros in numbers and characters of type octet, and
trailing spaces in other types.
For a multi-segment key the pieces are put together with check bytes
that allow varying length segments to compare correctly. Each segment
is padded to a multiple of 4 bytes, and after every four bytes a new
byte is introduced that holds the position of the current segment.
The code that stores a key on an index page performs prefix compression,
eliminated all the bytes that are the same as the previous key value on
the page.
Indexed retrieval is a three step process. First the system builds bit
maps of all index entries that meet the criteria in each step of the
restriction. The bit maps are ordered by data page. Then the bit maps
are combined with 'and' and 'or' operations as appropriate. Finally the
system retrieves the records indicated by the resultant bit map.
> and discuss if aIf the technology is unencumbered - not subject to patents or trade
> similar technology could be implemented in firebird. This technology can
> match thousands of criteria on million record DB:s in fractions of a
> second, regardless of selectivity.
>
> What is the appropriate forum for this?
>
secret protection, the architecture list
(firebird-architect@yahoogroups.com) is the right place.
Regards,
Ann