Subject Re: [firebird-support] SQL Performance help
Author Robert martin
Your original solution is good. Performance isn't too bad on 3m
records. Turning off the index below kills the performance however.

Thanks for your help :)

Rob Martin
Software Engineer

phone +64 03 377 0495
fax +64 03 377 0496
web www.chreos.com

Wild Software Ltd



Svein Erling Tysvaer wrote:
> Hmm, assuming that the keyword exists in more books than each book has
> keywords, you may try optimizing a bit further:
>
> SELECT bk.ISBN13, bk.FTI
> FROM Book bk
> JOIN BookKeyWord bkw1
> ON bk.ISBN13 = bkw1.ISBN13
> JOIN KeyWord kw1
> ON bkw1.WordRef = kw1.WordRef
> WHERE kw1.Word = 'WET'
> AND EXISTS(
> SELECT *
> FROM BookKeyWord bkw2
> JOIN KeyWord kw2
> ON bkw2.WordRef = kw2.WordRef
> WHERE bk.ISBN13 = bkw2.ISBN13
> AND kw2.Word||'' = 'WILD')
>
> Though this is just a guess, I haven't seen your plan and don't know
> about the selectivity of your indexes.
>
> Set
>
> Robert martin wrote:
>
>> HI Svein
>>
>> That is a nice solution. My mind tells me it should be faster than my
>> count(*) SQL however in testing they seem to come out with about the
>> same performance. We are building a larger dataset to work with so I
>> will test it with more lifelike data and let you know.
>>
>> My current test data has 790000 records in BookKeyWord. We estimate the
>> new test data (when loading is complete) will have about 3m records in
>> this table.
>>
>>
>> Rob Martin
>> Software Engineer
>>
>> phone +64 03 377 0495
>> fax +64 03 377 0496
>> web www.chreos.com
>>
>> Wild Software Ltd
>>
>>
>>
>> Svein Erling Tysvaer wrote:
>>
>>> Hi Rob!
>>>
>>> Unless one of the tables in your MAIN select (subselects don't count)
>>> are compared to a constant value in your WHERE clause, the optimizer has
>>> to use NATURAL for at least one table.
>>>
>>> Hence, unless you change to JOIN bkw ... JOIN kw ..., there is no way
>>> for you to avoid BK NATURAL.
>>>
>>> Also, HAVING COUNT... sounds to me like something that you may want to
>>> avoid.
>>>
>>> Consider this statement:
>>>
>>> SELECT bk.ISBN13, bk.FTI
>>> FROM Book bk
>>> JOIN BookKeyWord bkw1
>>> ON bk.ISBN13 = bkw1.ISBN13
>>> JOIN KeyWord kw1
>>> ON bkw1.WordRef = kw1.WordRef
>>> WHERE kw1.Word = 'WET'
>>> AND EXISTS(
>>> SELECT *
>>> FROM BookKeyWord bkw2
>>> JOIN KeyWord kw2
>>> ON bkw2.WordRef = kw2.WordRef
>>> WHERE bk.ISBN13 = bkw2.ISBN13
>>> AND kw2.Word = 'WILD')
>>>
>>> This can start the plan with kw1, link it to bkw1 and then to bk,
>>> something which should be considerably faster than your original query,
>>> as well as avoid duplicate records since each word hopefully only exist
>>> once for each book in BookKeyWord. Adding a third word to your search
>>> would simply require another AND EXISTS clause, something that normally
>>> is quick (to make it even quicker, you may try to use the most selective
>>> WORD for kw1, but I think that could easily require more effort to find
>>> the selectivity than to actually execute the statement).
>>>
>>> Try it,
>>> Set
>>>
>>> Robert martin wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> We have the following structure...
>>>>
>>>> Book (ISBN13 PK, ....) 50000 recs
>>>>
>>>> BookKeyWord (ISBN13, WordRef (CPK)) -> We have also created an index on
>>>> WordRef as this improve performance on some queries. 790000 recs
>>>>
>>>> KeyWord (WordRef PK, Word) -> Word is indexed. 81000 recs
>>>>
>>>>
>>>> We are trying to write a query that will return a list of books where
>>>> the book contains all of a list of words. We have the following attemp
>>>>
>>>>
>>>> SELECT bk.ISBN13, bk.FTI
>>>> FROM Book bk
>>>> WHERE bk.ISBN13 IN (
>>>>
>>>> SELECT bkw.ISBN13
>>>> FROM KeyWord kw
>>>> JOIN BookKeyWord bkw ON bkw.WordRef = kw.WordRef
>>>> WHERE kw.Word = 'WET'
>>>> OR kw.Word = 'WILD'
>>>> GROUP BY bkw.ISBN13
>>>> HAVING COUNT(bkw.ISBN13) = 2)
>>>>
>>>> Where this SQL will be generated in code. It takes 4 secs with the
>>>> following plan
>>>>
>>>> PLAN SORT (JOIN (KW INDEX (KEYWORD_WORD,KEYWORD_WORD),BKW INDEX
>>>> (BOOKKEYWORD_PK)))
>>>> PLAN (BK NATURAL)
>>>>
>>>> The problem appears to be the index on ISBN13 on Book isn't being used.
>>>> If I create a view based on the select used in the 'IN' part of the
>>>> above and then join this view to book the performance is ms and all
>>>> indexes are used.
>>>>
>>>> I have considered using UNION but can't find Union clause that only
>>>> returns the intersection of two (or more) SQLs.
>>>>
>>>> I would greatly appreciate any suggestions on how to improve this.
>>>> Ironically it is currently faster to create a temporary view, join of
>>>> this, display results and then delete the view !
>>>>
>>>> We are expecting the data to be significantly larger in the future.
>>>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Visit http://www.firebirdsql.org and click the Resources item
> on the main (top) menu. Try Knowledgebase and FAQ links !
>
> Also search the knowledgebases at http://www.ibphoenix.com
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Yahoo! Groups Links
>
>
>
>
>
>
>