Subject Re: Slower fetchall after creating indices
Author Ali Gökçen
> > Hi Adam,
> >
> > >
> > > SELECT
> > > FIRST 10
> > > Instantie.id
> > > FROM Instantie
> > > WHERE Instantie.deleted=0
> > > ORDER BY Instantie.telefoon ASC
> > >
> >
> > FIRST 10 doesn't effects the speed here, problem is not about
10,000
> > rows fetched. problem is:
> > you cant find out deleted instanties without full-scan.
> > so, your flagged as deleted 10 rows may be bottom of table
naturally,
> > (in case of natural scan)
>
> Ali,
>
> That is not true.
>
> Firstly, you do NOT need to do a full-scan to find deleted
instances.
> The query I wrote has an order by clause combined with a first N,
> meaning that it will return the smallest 10 records that meet the
> criteria. Without an index, the plan will look something like
>
> PLAN SORT ((Instantie NATURAL))

Ok, do we pry to GOD to wish that first 10 rows to be at top of table
records in naturally order?

who will guarantee to you there are exists deleted 10 rows?
can you find out them without natural full-scan that table?
imagine, there are 10 million records and 5 deleted rows,
can you get 10 deleted rows without scan ALL of TABLE?
can you get 10 deleted rows quickly if they are last rows?
No, of course.
what will happen, if he really needs 10,000 deleted rows for another
application?

we shouldt give data specific(dynamic storage) solution to users,
we should advice them scientific solutions.
there may be zero to 4 billion rows in table and
there may be zero to 4 billion deleted rows in table too.
we shouldn't to forget to calculate extrem points.


>
> In otherwords, internally Firebird will need to read EVERY record in
> the table, then sort it, then return only the first 10. (So it has
> unnecessarily read and sorted up to 1000 times as many records as it
> needs).

FB don't need read every records unless you force it via fetch or
force it for a internal calculation etc...

for example if i open a cursor for a huge table:

do select id from xtable

firebird doesnt reads all records unless you loop until end of table.
if i loop 10 times, FB reads 10 evailable records plus internally
their old versions.

>
> With the index, the plan will look something like
>
> PLAN (Instantie ORDER IX_telefoon)
>
> It will read the smallest record from the telefoon field, and check
if
> that is deleted, then go to the second smallest and check if that is
> deleted and so on.
>
> Unless you are particularly unlucky, and a huge majority the records
> with the smallest telefoon are all deleted, this is going to be much
> faster, because once it has returned 10, it simply stops. What you
> have described about requiring a full table scan is the theoretical
> worst case scenario, where all records are deleted except for 10,
and
> those 10 had the highest telefoon values.
>
> > or your deleted rows may be biggest numbers.(in case of indexed
scan)
> > Solution is easy:
> > create an index on deleted field. you will get in faster way even
you
> > have billion of rows.
>
> That index would be most likely pointless. An index is only useful
if
> by looking at the index I no longer need to read a significant
> proportion of the data pages. Or to put it in more simple terms, an
> index is only helpful if the amount of reads that are saved by
looking
> at the index is more than the cost of reading the index itself.
>
> The selectivity of the index would be very poor, after all there are
> only two possible values, so it would only be helpful if there is a
> significant skew in one direction (say all but 50 records are not
> deleted and you were running a query to retrieve deleted records).
>
> Adam
>

I'm sure about create an compound index prefixed by "deleted"
field. :)

Don't forget there is n (plus oldversions) rows to read naturally for
each time, this is a big risc for multiuser environments..
we must to economic and fair on shared resources if we want to
balanced performance.

Ali