Subject Re: [firebird-support] Firebird 2.0 Indexing
Author Ann W. Harrison
buppcpp wrote:
> Were the changes to the indexes in 2.0 suppose to provide more than
> larger indexes?

Yes, and they do. However, you need to distinguish between the
different problems you identify as "index performance".
>
>
> Ex. SELECT distinct store_no from mytable; (Does a table scan)

That's not a problem with the index. It's the optimizer that decides
whether to use an index or not, based on the access paths it knows how
to create.

As you probably have heard by now, a multi-generational record database
engine can not resolve queries from the index without reference to the
actual data records. Multi-generational indexes include entries that,
from the perspective of any given transaction, are not yet committed or
have been superseded. The only way to know if a record is valid for a
particular transaction is to read it. Since the records have to be
checked for validity, and since reading records in index order is slower
than reading records in storage order, a natural scan and sort is the
fastest (available) access path for this query.

It would be possible, in theory, to resolve this query without reading
all the records and sorting them. The engine could read the index until
it comes to a value it hasn't seen, then read that record to validate it
for the current transaction. If the record is valid, add it to the
result set and look for the next distinct value. If the record is not
valid, read the next record with the same value, validate it, and repeat.

For some value distributions, that would win. For others it would lose,
sometimes lose by a lot. If, for example, 85% of your records have
distinct values, and the values were randomly distributed through the
data pages, you'd read each data page several times - rattling the cache
and adding lots of I/0.
>
>
> SELECT * from mytable where store_no > 2; (Uses store_no index)
>
> It takes 3.27 minutes
> On the database I want to convert from, it takes 1.57 minutes.

Again, that's a problem with multi-generational indexes. It takes
longer to read the data than to resolve the query from the index. On
the other hand, readers don't block writers. An architecture based on
multi-generational records is a bad design for a single user database,
as your tests show.
>
> I for one would prefer to have the indexing system fixed in 2.0,
> because I think we may need to sit on it for awhile, while 3.0 gets
> all of it's bugs knocked out. Plus, 3.0 will have the same
> problems, if it's not taken care of now, but with the overhead of
> SMP code (for 1 CPU systems).

We expect that other performance improvements will mask the cost of SMP
support in V3, even on single CPU systems. On the other hand, the
design target of Firebird is high-concurrency read/write databases. If
your application targets single user, read/only access, you might
consider a different database.


Regards,


Ann