Subject Re: Indexing help
Author Svein Erling Tysvær
Hello Kjell, my comments are within:

--- In firebird-support@yahoogroups.com, Kjell Rilbe wrote:
> Hi,
>
> I'm evaluating Firebird 1.5 and need some hints about how I should
> index my tables for optimum performance.
>
> I have a db that's practically read-only, so I can ignore costs for
> maintaining indexes when doing updates.
>
> The DB is about 2 Gbyte in size. One master table with about 1
> million records and about 50 columns. About half of the columns are
> only used for output data, i.e. in the select part, and these are
> 5-80 characters.

Good, number of records and fields sounds similar to what I'm used to.

> The other half of the columns are used only in the where part and
> are rather short: 1-5 characters.
>
> A few child tables with 2-12 child records for each master record.
>
> I will be doing selects like this:
>
> select count(*),
> count(Master.Field1),
> count(Master.Field2),
> ...
> from Master
> inner join Detail1 on Master.Id = Detail1.Id
> inner join Detail2 on Master.Id = Detail2.Id
> where (Master.FieldM1 = 'value_M1_1'
> or Master.FieldM1 = 'value_M1_2'
> ...)
> and (Master.FieldM2 = 'value_M2_1'
> or Master.FieldM2 = 'value_M2_2'
> ...)
> and (Detail1.FieldD1 = 'value_D1_1'
> or Detail1.FieldD1 = 'value_D1_2'
> ...)
> and (Detail2.FieldD2 = 'value_D2_1'
> or Detail2.FieldD2 = 'value_D2_2'
> ...)
> group by Master.Id
>
> This will be executed, by each user, multiple times until he is
> satisfied with the returned counts. He will then execute the actual
> data select once, using the same criteria:
>
> select Field1,
> Field2,
> ...
> from ...
> where ...
>
> I need both of these queries to execute as quickly as possible.
>
> I cannot know beforehand exactly which columns will be used in the
> where part. The user can choose to base the query on an arbitrary
> subset of the available columns. Note that I only want distinct
> master records in the output, hence the group by. (Child tables are
> only used in the where part).

With 2-12 child records linking to each Master.ID, I'd say that when
you do a prepare, then the first table of the resulting plan could use
several indexes, whereas the other tables should only use the index
associated with the ID field (excessive use of indexes is the most
common cause for my queries running too slowly).

> How should I index this DB?

Well, I do not know, I do not know which fields will be selective or
not in your searches. Though I would be tempted to write something
that prepared the plan and depending on which table were first in the
plan, add one or more (... or 2=0) at strategic places in the query to
eliminate the use of some indexes.

> I know I can rewrite the query using exists for criteria in child
> tables. Would this improve performance? And if I do that, how should
> I index?

Well, you can do that, the only difference that would mean to my above
suggestion, is that you could fix the (...or 2=0) to set places of
each subquery since you've reduced the choices for Arno (the
optimizer). Though this could be bad if running queries with where
criteria only on child tables (it would mean NATURAL on the master
table). I doubt it would improve performance, but I don't know for
certain (I think I use EXISTS more than the average person on this
list, but JOIN can also be very good).

> The number of returned records would usually be a few hundred to a
> couple of thousand, but could potentially go up to tens or even
> hundreds of thousands.

Just tell your users that counting or returning 100000 records do take
some time and could make both them and their colleagues quite
frustrated if using SuperServer.

HTH,
Set

-true, I know nothing about physics and maybe I would translate æ to ä
if I lived in Sweden, but you wouldn't get the pronounciation right -
ä sounds between e and æ