Subject Re: Slower fetchall after creating indices
Author Adam
--- In firebird-support@yahoogroups.com, Ali Gökçen <ali_gokcen@...>
wrote:
>
> Hi Adam,
>
> >
> > SELECT
> > FIRST 10
> > Instantie.id
> > FROM Instantie
> > WHERE Instantie.deleted=0
> > ORDER BY Instantie.telefoon ASC
> >
>
> FIRST 10 doesn't effects the speed here, problem is not about 10,000
> rows fetched. problem is:
> you cant find out deleted instanties without full-scan.
> so, your flagged as deleted 10 rows may be bottom of table naturally,
> (in case of natural scan)

Ali,

That is not true.

Firstly, you do NOT need to do a full-scan to find deleted instances.
The query I wrote has an order by clause combined with a first N,
meaning that it will return the smallest 10 records that meet the
criteria. Without an index, the plan will look something like

PLAN SORT ((Instantie NATURAL))

In otherwords, internally Firebird will need to read EVERY record in
the table, then sort it, then return only the first 10. (So it has
unnecessarily read and sorted up to 1000 times as many records as it
needs).

With the index, the plan will look something like

PLAN (Instantie ORDER IX_telefoon)

It will read the smallest record from the telefoon field, and check if
that is deleted, then go to the second smallest and check if that is
deleted and so on.

Unless you are particularly unlucky, and a huge majority the records
with the smallest telefoon are all deleted, this is going to be much
faster, because once it has returned 10, it simply stops. What you
have described about requiring a full table scan is the theoretical
worst case scenario, where all records are deleted except for 10, and
those 10 had the highest telefoon values.

> or your deleted rows may be biggest numbers.(in case of indexed scan)
> Solution is easy:
> create an index on deleted field. you will get in faster way even you
> have billion of rows.

That index would be most likely pointless. An index is only useful if
by looking at the index I no longer need to read a significant
proportion of the data pages. Or to put it in more simple terms, an
index is only helpful if the amount of reads that are saved by looking
at the index is more than the cost of reading the index itself.

The selectivity of the index would be very poor, after all there are
only two possible values, so it would only be helpful if there is a
significant skew in one direction (say all but 50 records are not
deleted and you were running a query to retrieve deleted records).

Adam