firebird-support - Re: [firebird-support] SQL Performance help

Subject	Re: [firebird-support] SQL Performance help
Author	Svein Erling Tysvaer
Post date	2006-11-06T22:57:30Z

Hmm, assuming that the keyword exists in more books than each book has
keywords, you may try optimizing a bit further:

SELECT bk.ISBN13, bk.FTI
FROM Book bk
JOIN BookKeyWord bkw1
ON bk.ISBN13 = bkw1.ISBN13
JOIN KeyWord kw1
ON bkw1.WordRef = kw1.WordRef
WHERE kw1.Word = 'WET'
AND EXISTS(
SELECT *
FROM BookKeyWord bkw2
JOIN KeyWord kw2
ON bkw2.WordRef = kw2.WordRef
WHERE bk.ISBN13 = bkw2.ISBN13
AND kw2.Word||'' = 'WILD')

Though this is just a guess, I haven't seen your plan and don't know
about the selectivity of your indexes.

Set

Robert martin wrote:

> HI Svein
>
> That is a nice solution. My mind tells me it should be faster than my
> count(*) SQL however in testing they seem to come out with about the
> same performance. We are building a larger dataset to work with so I
> will test it with more lifelike data and let you know.
>
> My current test data has 790000 records in BookKeyWord. We estimate the
> new test data (when loading is complete) will have about 3m records in
> this table.
>
>
> Rob Martin
> Software Engineer
>
> phone +64 03 377 0495
> fax +64 03 377 0496
> web www.chreos.com
>
> Wild Software Ltd
>
>
>
> Svein Erling Tysvaer wrote:
>> Hi Rob!
>>
>> Unless one of the tables in your MAIN select (subselects don't count)
>> are compared to a constant value in your WHERE clause, the optimizer has
>> to use NATURAL for at least one table.
>>
>> Hence, unless you change to JOIN bkw ... JOIN kw ..., there is no way
>> for you to avoid BK NATURAL.
>>
>> Also, HAVING COUNT... sounds to me like something that you may want to
>> avoid.
>>
>> Consider this statement:
>>
>> SELECT bk.ISBN13, bk.FTI
>> FROM Book bk
>> JOIN BookKeyWord bkw1
>> ON bk.ISBN13 = bkw1.ISBN13
>> JOIN KeyWord kw1
>> ON bkw1.WordRef = kw1.WordRef
>> WHERE kw1.Word = 'WET'
>> AND EXISTS(
>> SELECT *
>> FROM BookKeyWord bkw2
>> JOIN KeyWord kw2
>> ON bkw2.WordRef = kw2.WordRef
>> WHERE bk.ISBN13 = bkw2.ISBN13
>> AND kw2.Word = 'WILD')
>>
>> This can start the plan with kw1, link it to bkw1 and then to bk,
>> something which should be considerably faster than your original query,
>> as well as avoid duplicate records since each word hopefully only exist
>> once for each book in BookKeyWord. Adding a third word to your search
>> would simply require another AND EXISTS clause, something that normally
>> is quick (to make it even quicker, you may try to use the most selective
>> WORD for kw1, but I think that could easily require more effort to find
>> the selectivity than to actually execute the statement).
>>
>> Try it,
>> Set
>>
>> Robert martin wrote:
>>
>>> Hi
>>>
>>> We have the following structure...
>>>
>>> Book (ISBN13 PK, ....) 50000 recs
>>>
>>> BookKeyWord (ISBN13, WordRef (CPK)) -> We have also created an index on
>>> WordRef as this improve performance on some queries. 790000 recs
>>>
>>> KeyWord (WordRef PK, Word) -> Word is indexed. 81000 recs
>>>
>>>
>>> We are trying to write a query that will return a list of books where
>>> the book contains all of a list of words. We have the following attemp
>>>
>>>
>>> SELECT bk.ISBN13, bk.FTI
>>> FROM Book bk
>>> WHERE bk.ISBN13 IN (
>>>
>>> SELECT bkw.ISBN13
>>> FROM KeyWord kw
>>> JOIN BookKeyWord bkw ON bkw.WordRef = kw.WordRef
>>> WHERE kw.Word = 'WET'
>>> OR kw.Word = 'WILD'
>>> GROUP BY bkw.ISBN13
>>> HAVING COUNT(bkw.ISBN13) = 2)
>>>
>>> Where this SQL will be generated in code. It takes 4 secs with the
>>> following plan
>>>
>>> PLAN SORT (JOIN (KW INDEX (KEYWORD_WORD,KEYWORD_WORD),BKW INDEX
>>> (BOOKKEYWORD_PK)))
>>> PLAN (BK NATURAL)
>>>
>>> The problem appears to be the index on ISBN13 on Book isn't being used.
>>> If I create a view based on the select used in the 'IN' part of the
>>> above and then join this view to book the performance is ms and all
>>> indexes are used.
>>>
>>> I have considered using UNION but can't find Union clause that only
>>> returns the intersection of two (or more) SQLs.
>>>
>>> I would greatly appreciate any suggestions on how to improve this.
>>> Ironically it is currently faster to create a temporary view, join of
>>> this, display results and then delete the view !
>>>
>>> We are expecting the data to be significantly larger in the future.