Subject | Re: [Firebird-Architect] Re: Indexing tales - part 1 - Siblings |
---|---|
Author | Ann W. Harrison |
Post date | 2006-10-12T17:01:17Z |
= m. Th = wrote:
there. Locking is done on the index pages, not on values. Pages are
linked left and right. The index is built from the bottom up, with
record pointers only in the lowest level. Upper levels are exclusively
pointers to lower levels. The index is never "unbalanced" in the sense
that some branches are longer than others.
For concurrency, it's critical not to lock whole indexes for any
change, except dropping and recreating the index. New values can be
added to a pages, pages can split, splits can propagate up, and pages
can be recombined without ever holding a write lock on more than one
page at a time. That turns out to be a critical requirement for
avoiding internal deadlocks.
It's also important that multiple transactions be able to insert
and remove index entries at the same time. And the algorithms
must work when transactions are running in classic mode -
separate page caches.
and would not work with most long strings because of the restriction
on the length of index keys. Containing is often used on blobs and
long varchars - fields over 1Kb.
The current index code is not set up to handle a search that finds
records scattered throughout the index. Even range retrievals are
dense - keys don't match, then they do, then they don't.
However, within those limits, theoretically, you could write code
that expanded each index key, up-cased it, and scanned it for the
value from the containing. That would, probably, be faster than
reading records. But it's the wrong way to do a keyword index and
I'm not at all convinced that there is a huge market demand for
the ability to search for substrings.
Regards,
Ann
>Before you redesign the index structure, do try to understand what's
> But this will lock the entire tree for scan_fn or for Change_fn if a
> function from the other group is holding it.
there. Locking is done on the index pages, not on values. Pages are
linked left and right. The index is built from the bottom up, with
record pointers only in the lowest level. Upper levels are exclusively
pointers to lower levels. The index is never "unbalanced" in the sense
that some branches are longer than others.
For concurrency, it's critical not to lock whole indexes for any
change, except dropping and recreating the index. New values can be
added to a pages, pages can split, splits can propagate up, and pages
can be recombined without ever holding a write lock on more than one
page at a time. That turns out to be a critical requirement for
avoiding internal deadlocks.
It's also important that multiple transactions be able to insert
and remove index entries at the same time. And the algorithms
must work when transactions are running in classic mode -
separate page caches.
> 1. can you optimize the 'CONTAINING' predicate using a full scan indexTheoretically, yes, though it would require reading the entire index,
> (given that the index has a sensible lower selectivity than 1)
and would not work with most long strings because of the restriction
on the length of index keys. Containing is often used on blobs and
long varchars - fields over 1Kb.
The current index code is not set up to handle a search that finds
records scattered throughout the index. Even range retrievals are
dense - keys don't match, then they do, then they don't.
However, within those limits, theoretically, you could write code
that expanded each index key, up-cased it, and scanned it for the
value from the containing. That would, probably, be faster than
reading records. But it's the wrong way to do a keyword index and
I'm not at all convinced that there is a huge market demand for
the ability to search for substrings.
Regards,
Ann