Subject [firebird-support] Re: index slows down the query
Author anand anand
Hello Arno Brinkman,

Thanks for the information about index stuff here.

|Could you give also
|the statistic numbers that belong to those indices.


Database header page information:
Flags 0
Checksum 12345
Generation 72646
Page size 1024
ODS version 10.1
Oldest transaction 56664
Oldest active 72639
Oldest snapshot 72639
Next transaction 72640
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Jan 23, 2004 12:04:05
Attributes

Variable header data:
Sweep interval: 20000
*END*


Database file sequence:
File C:\LCHFB\PROHIRE.FDB is the only file

Database log page information:
Creation date
Log flags: 2
No write ahead log

Next log page: 0

Variable log data:
Control Point 1:
File name:
Partition offset: 0 Seqno: 0
Offset: 0
Control Point 2:
File name:
Partition offset: 0 Seqno: 0
Offset: 0
Current File:
File name:
Partition offset: 0 Seqno: 0
Offset: 0
*END*

Analyzing database pages ...

Booking (139)

Index IDX_BOOKING_CLIENTNO_REQ_CI_DAT (8)
Depth: 4, leaf buckets: 3536, nodes: 339755
Average data length: 4.00, total dup: 165370,
max dup: 276
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 3535

Index IDX_BOOKING_CLIENTNO_REQ_CO_DAT (9)
Depth: 4, leaf buckets: 3527, nodes: 339755
Average data length: 4.00, total dup: 167405,
max dup: 158
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 3526

Index IDX_Booking (6)
Depth: 3, leaf buckets: 2112, nodes: 339755
Average data length: 0.00, total dup: 338366,
max dup: 691
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2112

Index IDX_Booking1 (7)
Depth: 3, leaf buckets: 2112, nodes: 339755
Average data length: 0.00, total dup: 338306,
max dup: 1003
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 2111

Index IDX_Booking2 (0)
Depth: 3, leaf buckets: 2486, nodes: 339755
Average data length: 1.00, total dup: 10040,
max dup: 8623
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 2485

Index IDX_Booking3 (1)
Depth: 3, leaf buckets: 2188, nodes: 339755
Average data length: 0.00, total dup: 269480,
max dup: 22425
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 2187

Index IDX_Booking4 (2)
Depth: 3, leaf buckets: 2098, nodes: 339755
Average data length: 0.00, total dup: 339754,
max dup: 339754
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2097

Index IDX_Booking5 (3)
Depth: 3, leaf buckets: 2098, nodes: 339755
Average data length: 0.00, total dup: 339721,
max dup: 25141
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2097

Index IDX_Booking7 (4)
Depth: 3, leaf buckets: 2794, nodes: 339755
Average data length: 1.00, total dup: 195943,
max dup: 139667
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2794

Index PK_Booking (5)
Depth: 3, leaf buckets: 2464, nodes: 339755
Average data length: 1.00, total dup: 0, max
dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 2463

Client (146)

Index IDX_Client (0)
Depth: 3, leaf buckets: 1562, nodes: 252977
Average data length: 0.00, total dup: 252975,
max dup: 250825
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 1561

Index IDX_Client1 (1)
Depth: 3, leaf buckets: 1574, nodes: 252977
Average data length: 0.00, total dup: 251620,
max dup: 251583
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1573

Index IDX_Client2 (2)
Depth: 3, leaf buckets: 1766, nodes: 252977
Average data length: 0.00, total dup: 200296,
max dup: 6004
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1766

Index IDX_Client3 (3)
Depth: 3, leaf buckets: 1768, nodes: 252977
Average data length: 0.00, total dup: 207852,
max dup: 11446
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1767

Index IDX_Client4 (4)
Depth: 3, leaf buckets: 1707, nodes: 252977
Average data length: 0.00, total dup: 199998,
max dup: 156747
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1707

Index IDX_Client5 (5)
Depth: 4, leaf buckets: 3392, nodes: 252977
Average data length: 6.00, total dup: 59493,
max dup: 43211
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 3392

Index IDX_Client6 (6)
Depth: 3, leaf buckets: 1657, nodes: 252977
Average data length: 0.00, total dup: 243467,
max dup: 156543
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 1656

Index PK_Client (7)
Depth: 3, leaf buckets: 1828, nodes: 252977
Average data length: 1.00, total dup: 0, max
dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1827

|could create 2 new
|indices which may be interesting :

after adding indices

Plan:
PLAN (Booking INDEX
(IDX_BOOKING_CLIENTNO_REQ_CI_DAT,IDX_Booking))
PLAN (Client NATURAL)

Adapted plan:
PLAN (Booking INDEX
(IDX_BOOKING_CLIENTNO_REQ_CI_DAT,IDX_Booking))
PLAN (Client NATURAL)

|workaround

workaround is good it took around 5 min but slow when
compared without indices.

planer doesn't select the correct index. I tried
adding plan sort statement in the sql to add all the
three indices i think it did but it took long time to
execute.

i counld find where the problem lies?

regards
Anand



--- Arno Brinkman <firebird@...> wrote:
---------------------------------
Hi Anand,

> |>PLAN returned by the server

> select count(*)
> FROM "Client" WHERE "Client"."Client No" in
>

> SELECT DISTINCT "Booking"."Client No" FROM "Booking"
> WHERE "Booking"."End Date" >= '02-feb-2001' AND
> "Booking"."Start Date" <= '02-may-2001'
> )
> Plan:
> PLAN (Booking INDEX (IDX_Booking3,IDX_Booking))
> PLAN (Client NATURAL)

I tried the same with your given metadata on my
machine, but the plan
returned by the engine on FB1.5 is :

PLAN SORT ((Booking INDEX
(IDX_Booking3,IDX_Booking1,IDX_Booking)))

The reason why IDX_Booking1 is not included in your
PLAN could be because it
has a very bad selectivity or it isn't activated at
all. Could you give also
the statistic numbers that belong to those indices.
For example with :

gstat -i -t "Booking" -t "Client" -u sysdba -p
masterkey C:\Yourdb.fdb


> |> the DISTINCT is unneeded here.
>
> there can be more than one booking for one client so
> distinct reomove the duplicate rec in the sub query.

With IN-predicate or EXISTS clause that isn't needed,
after the first match
is found the evaluation is done.

> |>EXISTS
>
> i tried Exists it works similar to in statement very
> slow.

Those times should be equal in fact.

> >>>>
> SELECT
> count(*)
> FROM
> "Client"
> WHERE
> EXISTS(SELECT 1 FROM "Booking"
> WHERE
> "Booking"."End Date" >= '02-feb-2001' AND
> "Booking"."End Date" <= '02-may-2001' AND
> "Booking"."Start Date" >= '02-feb-2001' AND
> "Booking"."Start Date" <= '02-may-2001' AND
> "Client"."Client No" = "Booking"."Client No")
>
> End Date should be >= '02-feb-2001' and it is
> not necessary that it should be <= '02-may-2001'.
>
> i dont know, why adding index slows down the sql .
> it doesn't takes the index on end date.

I'll try to explain a little about the index stuff
here :

The engine first reads the necessary index-part and
then combines the
indices together to a final bitmap. This bitmap is
used to retrieve the data
from the datapages. (see more about this on
http://www.ibphoenix.com/a533.htm)
If Booking has 300.000 records then every index on
that table has at least
also 300.000 index-entries. Let's assume you started
in 1998-01-01 with
inserting data and ended today (about 2001 is the
middle.. uhh..).
To ask for all "Booking"."End Date" >= '02-feb-2001'
this will search for
the starting '02-feb-2001' index-entry and will scan
the index to the end.
This will read approximately 150.000 index-entries.
To ask also for "Booking"."Start Date" <=
'02-may-2001' this will start
scanning at the first index-entry till beyond
'02-may-2001' and also read
approximately 150.000 index-entries.
You also do a equal match on "Booking"."Client No" and
the engine will start
searching at the first index-entry which contains
"Client"."Client No" and
scan until a different index-entry is found. The
question is how many
duplicates "Client No" there are in table "Booking",
but lets assume there
are 100 Clients and it needs to read approximately
3.000 index-entries.
Those 3 index-bitmaps are AND-ed together to a final
bitmap (which always is
3.000 or less in this example) which will be used for
fetching for the
sub-query.

The situation i explained above was only about the
sub-query, thus this is
executed for EVERY record in Client. Can you already
imagine why it is so
slow with indices ?

You could speed up the thing by choosing the right
indices for the work and
try to include all the information you need. Are you
sure that an end-date
could be newer as the start-date you're filtering for?
The same counts for
start-date. Try to use a lower and a upper value
comparision for the dates
this should really speed up the thing. Beside that you
could create 2 new
indices which may be interesting :

CREATE ASC INDEX I_BOOKING_CLIENTNO_REQ_CI_DATE
ON "Booking" ("Client No", "Required CI Date");
CREATE ASC INDEX I_BOOKING_CLIENTNO_REQ_CO_DATE
ON "Booking" ("Client No", "Required CO Date");

Pitty enough the optimizer wouldn't use both indices
yet with your query,
but we can workaround that by splitting up the
sub-query in two sub-queries
:

SELECT
COUNT(*)
FROM
"Client"
WHERE
EXISTS(SELECT 1 FROM "Booking" WHERE
"Booking"."Client No" = "Client"."Client No" AND
"Booking"."Required CI Date" >= '02-feb-2001')
AND
EXISTS(SELECT 1 FROM "Booking" WHERE
"Booking"."Client No" = "Client"."Client No" AND
"Booking"."Required CO Date" <= '02-may-2001')

and even better it would be if you included a lower
and upper bound to the
date comparisions.

Well i hope you can do something with the information
i gave to you and i'm
curious if one of my suggestions will work for you.


Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with
many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://80.126.130.81








---------------------------------
Yahoo! Groups Links

To visit your group on the web, go to:
http://groups.yahoo.com/group/firebird-support/

To unsubscribe from this group, send an email to:
firebird-support-unsubscribe@yahoogroups.com

Your use of Yahoo! Groups is subject to the Yahoo!
Terms of Service.






___________________________________________________________
Yahoo! Messenger - Communicate instantly..."Ping"
your friends today! Download Messenger Now
http://uk.messenger.yahoo.com/download/index.html