Subject Re: [firebird-support] SQL Performance help
Author Svein Erling Tysvaer
Hi Rob!

Unless one of the tables in your MAIN select (subselects don't count)
are compared to a constant value in your WHERE clause, the optimizer has
to use NATURAL for at least one table.

Hence, unless you change to JOIN bkw ... JOIN kw ..., there is no way
for you to avoid BK NATURAL.

Also, HAVING COUNT... sounds to me like something that you may want to
avoid.

Consider this statement:

SELECT bk.ISBN13, bk.FTI
FROM Book bk
JOIN BookKeyWord bkw1
ON bk.ISBN13 = bkw1.ISBN13
JOIN KeyWord kw1
ON bkw1.WordRef = kw1.WordRef
WHERE kw1.Word = 'WET'
AND EXISTS(
SELECT *
FROM BookKeyWord bkw2
JOIN KeyWord kw2
ON bkw2.WordRef = kw2.WordRef
WHERE bk.ISBN13 = bkw2.ISBN13
AND kw2.Word = 'WILD')

This can start the plan with kw1, link it to bkw1 and then to bk,
something which should be considerably faster than your original query,
as well as avoid duplicate records since each word hopefully only exist
once for each book in BookKeyWord. Adding a third word to your search
would simply require another AND EXISTS clause, something that normally
is quick (to make it even quicker, you may try to use the most selective
WORD for kw1, but I think that could easily require more effort to find
the selectivity than to actually execute the statement).

Try it,
Set

Robert martin wrote:
> Hi
>
> We have the following structure...
>
> Book (ISBN13 PK, ....) 50000 recs
>
> BookKeyWord (ISBN13, WordRef (CPK)) -> We have also created an index on
> WordRef as this improve performance on some queries. 790000 recs
>
> KeyWord (WordRef PK, Word) -> Word is indexed. 81000 recs
>
>
> We are trying to write a query that will return a list of books where
> the book contains all of a list of words. We have the following attemp
>
>
> SELECT bk.ISBN13, bk.FTI
> FROM Book bk
> WHERE bk.ISBN13 IN (
>
> SELECT bkw.ISBN13
> FROM KeyWord kw
> JOIN BookKeyWord bkw ON bkw.WordRef = kw.WordRef
> WHERE kw.Word = 'WET'
> OR kw.Word = 'WILD'
> GROUP BY bkw.ISBN13
> HAVING COUNT(bkw.ISBN13) = 2)
>
> Where this SQL will be generated in code. It takes 4 secs with the
> following plan
>
> PLAN SORT (JOIN (KW INDEX (KEYWORD_WORD,KEYWORD_WORD),BKW INDEX
> (BOOKKEYWORD_PK)))
> PLAN (BK NATURAL)
>
> The problem appears to be the index on ISBN13 on Book isn't being used.
> If I create a view based on the select used in the 'IN' part of the
> above and then join this view to book the performance is ms and all
> indexes are used.
>
> I have considered using UNION but can't find Union clause that only
> returns the intersection of two (or more) SQLs.
>
> I would greatly appreciate any suggestions on how to improve this.
> Ironically it is currently faster to create a temporary view, join of
> this, display results and then delete the view !
>
> We are expecting the data to be significantly larger in the future.