firebird-support - Re: [firebird-support] Re: index slows down the query

Subject	Re: [firebird-support] Re: index slows down the query
Author	Arno Brinkman
Post date	2004-02-25T01:03:54Z

Hi Anand,

> |>PLAN returned by the server

> select count(*)
> FROM "Client" WHERE "Client"."Client No" in
>

> SELECT DISTINCT "Booking"."Client No" FROM "Booking"
> WHERE "Booking"."End Date" >= '02-feb-2001' AND
> "Booking"."Start Date" <= '02-may-2001'
> )
> Plan:
> PLAN (Booking INDEX (IDX_Booking3,IDX_Booking))
> PLAN (Client NATURAL)

I tried the same with your given metadata on my machine, but the plan
returned by the engine on FB1.5 is :

PLAN SORT ((Booking INDEX (IDX_Booking3,IDX_Booking1,IDX_Booking)))

The reason why IDX_Booking1 is not included in your PLAN could be because it
has a very bad selectivity or it isn't activated at all. Could you give also
the statistic numbers that belong to those indices. For example with :

gstat -i -t "Booking" -t "Client" -u sysdba -p masterkey C:\Yourdb.fdb

> |> the DISTINCT is unneeded here.
>
> there can be more than one booking for one client so
> distinct reomove the duplicate rec in the sub query.

With IN-predicate or EXISTS clause that isn't needed, after the first match
is found the evaluation is done.

> |>EXISTS
>
> i tried Exists it works similar to in statement very
> slow.

Those times should be equal in fact.

> >>>>
> SELECT
> count(*)
> FROM
> "Client"
> WHERE
> EXISTS(SELECT 1 FROM "Booking"
> WHERE
> "Booking"."End Date" >= '02-feb-2001' AND
> "Booking"."End Date" <= '02-may-2001' AND
> "Booking"."Start Date" >= '02-feb-2001' AND
> "Booking"."Start Date" <= '02-may-2001' AND
> "Client"."Client No" = "Booking"."Client No")
>
> End Date should be >= '02-feb-2001' and it is
> not necessary that it should be <= '02-may-2001'.
>
> i dont know, why adding index slows down the sql .
> it doesn't takes the index on end date.

I'll try to explain a little about the index stuff here :

The engine first reads the necessary index-part and then combines the
indices together to a final bitmap. This bitmap is used to retrieve the data
from the datapages. (see more about this on
http://www.ibphoenix.com/a533.htm)
If Booking has 300.000 records then every index on that table has at least
also 300.000 index-entries. Let's assume you started in 1998-01-01 with
inserting data and ended today (about 2001 is the middle.. uhh..).
To ask for all "Booking"."End Date" >= '02-feb-2001' this will search for
the starting '02-feb-2001' index-entry and will scan the index to the end.
This will read approximately 150.000 index-entries.
To ask also for "Booking"."Start Date" <= '02-may-2001' this will start
scanning at the first index-entry till beyond '02-may-2001' and also read
approximately 150.000 index-entries.
You also do a equal match on "Booking"."Client No" and the engine will start
searching at the first index-entry which contains "Client"."Client No" and
scan until a different index-entry is found. The question is how many
duplicates "Client No" there are in table "Booking", but lets assume there
are 100 Clients and it needs to read approximately 3.000 index-entries.
Those 3 index-bitmaps are AND-ed together to a final bitmap (which always is
3.000 or less in this example) which will be used for fetching for the
sub-query.

The situation i explained above was only about the sub-query, thus this is
executed for EVERY record in Client. Can you already imagine why it is so
slow with indices ?

You could speed up the thing by choosing the right indices for the work and
try to include all the information you need. Are you sure that an end-date
could be newer as the start-date you're filtering for? The same counts for
start-date. Try to use a lower and a upper value comparision for the dates
this should really speed up the thing. Beside that you could create 2 new
indices which may be interesting :

CREATE ASC INDEX I_BOOKING_CLIENTNO_REQ_CI_DATE
ON "Booking" ("Client No", "Required CI Date");
CREATE ASC INDEX I_BOOKING_CLIENTNO_REQ_CO_DATE
ON "Booking" ("Client No", "Required CO Date");

Pitty enough the optimizer wouldn't use both indices yet with your query,
but we can workaround that by splitting up the sub-query in two sub-queries
:

SELECT
COUNT(*)
FROM
"Client"
WHERE
EXISTS(SELECT 1 FROM "Booking" WHERE
"Booking"."Client No" = "Client"."Client No" AND
"Booking"."Required CI Date" >= '02-feb-2001')
AND
EXISTS(SELECT 1 FROM "Booking" WHERE
"Booking"."Client No" = "Client"."Client No" AND
"Booking"."Required CO Date" <= '02-may-2001')

and even better it would be if you included a lower and upper bound to the
date comparisions.

Well i hope you can do something with the information i gave to you and i'm
curious if one of my suggestions will work for you.

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://80.126.130.81