Subject Re: Slower fetchall after creating indices
Author Adam
--- In, Ali Gökçen <ali_gokcen@...>
> > > Hi Adam,
> > >
> > > >
> > > > SELECT
> > > > FIRST 10
> > > >
> > > > FROM Instantie
> > > > WHERE Instantie.deleted=0
> > > > ORDER BY Instantie.telefoon ASC
> > > >
> > >
> > > FIRST 10 doesn't effects the speed here, problem is not about
> 10,000
> > > rows fetched. problem is:
> > > you cant find out deleted instanties without full-scan.
> > > so, your flagged as deleted 10 rows may be bottom of table
> naturally,
> > > (in case of natural scan)
> >
> > Ali,
> >
> > That is not true.
> >
> > Firstly, you do NOT need to do a full-scan to find deleted
> instances.
> > The query I wrote has an order by clause combined with a first N,
> > meaning that it will return the smallest 10 records that meet the
> > criteria. Without an index, the plan will look something like
> >
> > PLAN SORT ((Instantie NATURAL))
> Ok, do we pry to GOD to wish that first 10 rows to be at top of
> records in naturally order?

Even if they were, because you would be reading in natural order, it
would need to read them and sort them ALL before it could obey the
order by clause.

> who will guarantee to you there are exists deleted 10 rows?
> can you find out them without natural full-scan that table?
> imagine, there are 10 million records and 5 deleted rows,
> can you get 10 deleted rows without scan ALL of TABLE?
> can you get 10 deleted rows quickly if they are last rows?
> No, of course.
> what will happen, if he really needs 10,000 deleted rows for
> application?

I think you are missing my point.

The index on telefoon DOES NOT CHANGE whether they meet the where
clause or not. If the index is available, Firebird will read the
records in the order they are to be returned in. It will only
actually return the record to the client application if it meets the
where criteria of course. It may be the case that it must go through
500 records before it encounters 10 that match the where criteria,
but that may still be more efficient than reading ALL 10000 records
in the table.

If the index does not exist, then Firebird must evaluate ALL POSSIBLE
records, and then sort them into ascending order by telefoon, and
then it can return the first 10.

> we shouldt give data specific(dynamic storage) solution to users,
> we should advice them scientific solutions.
> there may be zero to 4 billion rows in table and
> there may be zero to 4 billion deleted rows in table too.
> we shouldn't to forget to calculate extrem points.

Optimisation is largely cost estimation which is largely statistical
analysis. You deal in expected values and distributions, standard
deviations etc.

The only way to objectively measure whether an index is useful or not
is to run a query with the index and run the query without the index,
shutting down the server in between the tests to avoid the cache
influencing your results, and then to state that one of them was
faster with a given query. This is of course counter-productive. I
may know which way can work it out the fastest, but I also know that
it has taken me longer than the worst choice I could have made.

So I can generalise:

If more than 10% of the records of the table will meet your where
criteria, and you are selecting a small percentage of the records
using select first, the ordered reads will be significantly faster on

It is of course possible to insert such values to make ordered reads
much more expensive by making them check a significant number of
records that do not fit the where criteria, some may call this 'worst
case'. Whilst the worst case must be considered, a reality check must
also be made to see whether such a worst case is likely. Is the net
result that the index is useful or counter-productive.