firebird-support - Re: [firebird-support] Re: Composite index

Subject	Re: [firebird-support] Re: Composite index - issue or not existing feature?
Author	Ann Harrison
Post date	2016-03-14T18:31:41Z

On Mon, Mar 14, 2016 at 3:41 AM, liviuslivius liviuslivius@... [firebird-support] <firebird-support@yahoogroups.com> wrote:

W dniu 2016-03-14 08:36:40 użytkownik Dmitry Yemanov dimitr@... [firebird-support] <firebird-support@yahoogroups.com> napisał:

> SELECT * FROM dbo.XXX X WHERE X.A BETWEEN 2 AND 30 *AND* X.B BETWEEN 5
> AND 60
...
> As you can see only A key is used but B key should be also used.
> I am missing something?

Yes, you do. If the first segment is matched for non-equality, then the
following segments cannot be used.

Why?

Index is a Tree? And if i found VALUE 2 in A key then i can fast find value 5 in sub key (leaf)

You scan through keys in A, and then in finded nodes you look for leafs in B

Expanding on Dmitry's answer...

Yes, the index is a tree, but it's not a tree the way you imagine it. At least not the way
I imagine you imagine it, which is something like this:. The top of the index has ranges
of values for the first field in the index, which point to narrower ranges of values for for
the first field, going down to the point where you get a single value for the first field with
ranges of values for the second field hanging under it. Indexes like that get unbalanced
easily and tend to be very deep.

A Firebird index key is a single value built out of all parts of a compound key with some
extra stuff dribbled around so you can tell the difference between the pairs of values
"deadlock" "search" and "dead" "locksearch". The whole key is mangled so it compares
correctly bytewise. An index entry has a prefix, a key - possibly missing some front
bytes - and the database key (record id) of the matching record. A single level Firebird
index is just a stream of index entries. It's a little more complicated than that to reduce
the cost of reading large pages(*).

When there are too many entries to fit on a single page, Firebird splits the page and
creates a new level(**) above the level with index keys and record ids. The upper level
has index keys, record ids, and the page number of the lower level index page that
starts with that index key and record id(***). The lower levels have pointers to their
upper level, and to their right and left neighbors.

Each time an index page splits(****), the first key and record id pair of the new page
gets pushed up to the next level. Eventually the upper level has to split and a new,
even higher level is created pointing to the next layer down.

Which is a very long winded way of saying that the Firebird index handling code
hasn't a clue that it's looking at a compound index, let alone where the values are
split. If you want to do range retrievals on several fields, create an index for each
one. Then, when Firebird executes the search, it will search one index, building
a bitmap of record ids in that range, then search the other building a second bitmap
and combine the two bitmaps to retrieve only those records that match both criteria.

Good luck,

Ann

(*) Index entries are variable length so you can't use a binary search on an index
page. When pages got bigger that 4KBb, the time spent on index searches went
mostly into reading, on average, half the page. Now indexes in databases with
large page sizes have an internal index that cuts the average read of a 16Kb page
from 8Kb to 1Kb

(**) To simplify internal bookkeeping, the first physical page allocated to an index
will always be the top page, so there's some fancy data shuffling when a new level
is created. First Firebird creates two new pages, then it moves the data from the
old top level to the new pages and fills the old top level with pointers down to the
new pages.

(***) The record id is propagated to the upper levels to make each index entry unique,
even when the keys aren't. That saves a lot of time in index garbage collection but
isn't otherwise interesting.

(****) The way a page split works depends on where the entry that caused the split
would go on the page. If the new entry goes in the middle of the page, half the entries
are moved to the new page. If it would have been the last entry, most of the entries
stay on the old page and only enough to start the new page goes on that page. So
if you're loading records in key order - or using a sequence to create your keys - the
index stays dense.