Subject | Re: [firebird-support] SQL Performance help |
---|---|
Author | Robert martin |
Post date | 2006-11-07T00:46:35Z |
Your original solution is good. Performance isn't too bad on 3m
records. Turning off the index below kills the performance however.
Thanks for your help :)
Rob Martin
Software Engineer
phone +64 03 377 0495
fax +64 03 377 0496
web www.chreos.com
Wild Software Ltd
Svein Erling Tysvaer wrote:
records. Turning off the index below kills the performance however.
Thanks for your help :)
Rob Martin
Software Engineer
phone +64 03 377 0495
fax +64 03 377 0496
web www.chreos.com
Wild Software Ltd
Svein Erling Tysvaer wrote:
> Hmm, assuming that the keyword exists in more books than each book has
> keywords, you may try optimizing a bit further:
>
> SELECT bk.ISBN13, bk.FTI
> FROM Book bk
> JOIN BookKeyWord bkw1
> ON bk.ISBN13 = bkw1.ISBN13
> JOIN KeyWord kw1
> ON bkw1.WordRef = kw1.WordRef
> WHERE kw1.Word = 'WET'
> AND EXISTS(
> SELECT *
> FROM BookKeyWord bkw2
> JOIN KeyWord kw2
> ON bkw2.WordRef = kw2.WordRef
> WHERE bk.ISBN13 = bkw2.ISBN13
> AND kw2.Word||'' = 'WILD')
>
> Though this is just a guess, I haven't seen your plan and don't know
> about the selectivity of your indexes.
>
> Set
>
> Robert martin wrote:
>
>> HI Svein
>>
>> That is a nice solution. My mind tells me it should be faster than my
>> count(*) SQL however in testing they seem to come out with about the
>> same performance. We are building a larger dataset to work with so I
>> will test it with more lifelike data and let you know.
>>
>> My current test data has 790000 records in BookKeyWord. We estimate the
>> new test data (when loading is complete) will have about 3m records in
>> this table.
>>
>>
>> Rob Martin
>> Software Engineer
>>
>> phone +64 03 377 0495
>> fax +64 03 377 0496
>> web www.chreos.com
>>
>> Wild Software Ltd
>>
>>
>>
>> Svein Erling Tysvaer wrote:
>>
>>> Hi Rob!
>>>
>>> Unless one of the tables in your MAIN select (subselects don't count)
>>> are compared to a constant value in your WHERE clause, the optimizer has
>>> to use NATURAL for at least one table.
>>>
>>> Hence, unless you change to JOIN bkw ... JOIN kw ..., there is no way
>>> for you to avoid BK NATURAL.
>>>
>>> Also, HAVING COUNT... sounds to me like something that you may want to
>>> avoid.
>>>
>>> Consider this statement:
>>>
>>> SELECT bk.ISBN13, bk.FTI
>>> FROM Book bk
>>> JOIN BookKeyWord bkw1
>>> ON bk.ISBN13 = bkw1.ISBN13
>>> JOIN KeyWord kw1
>>> ON bkw1.WordRef = kw1.WordRef
>>> WHERE kw1.Word = 'WET'
>>> AND EXISTS(
>>> SELECT *
>>> FROM BookKeyWord bkw2
>>> JOIN KeyWord kw2
>>> ON bkw2.WordRef = kw2.WordRef
>>> WHERE bk.ISBN13 = bkw2.ISBN13
>>> AND kw2.Word = 'WILD')
>>>
>>> This can start the plan with kw1, link it to bkw1 and then to bk,
>>> something which should be considerably faster than your original query,
>>> as well as avoid duplicate records since each word hopefully only exist
>>> once for each book in BookKeyWord. Adding a third word to your search
>>> would simply require another AND EXISTS clause, something that normally
>>> is quick (to make it even quicker, you may try to use the most selective
>>> WORD for kw1, but I think that could easily require more effort to find
>>> the selectivity than to actually execute the statement).
>>>
>>> Try it,
>>> Set
>>>
>>> Robert martin wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> We have the following structure...
>>>>
>>>> Book (ISBN13 PK, ....) 50000 recs
>>>>
>>>> BookKeyWord (ISBN13, WordRef (CPK)) -> We have also created an index on
>>>> WordRef as this improve performance on some queries. 790000 recs
>>>>
>>>> KeyWord (WordRef PK, Word) -> Word is indexed. 81000 recs
>>>>
>>>>
>>>> We are trying to write a query that will return a list of books where
>>>> the book contains all of a list of words. We have the following attemp
>>>>
>>>>
>>>> SELECT bk.ISBN13, bk.FTI
>>>> FROM Book bk
>>>> WHERE bk.ISBN13 IN (
>>>>
>>>> SELECT bkw.ISBN13
>>>> FROM KeyWord kw
>>>> JOIN BookKeyWord bkw ON bkw.WordRef = kw.WordRef
>>>> WHERE kw.Word = 'WET'
>>>> OR kw.Word = 'WILD'
>>>> GROUP BY bkw.ISBN13
>>>> HAVING COUNT(bkw.ISBN13) = 2)
>>>>
>>>> Where this SQL will be generated in code. It takes 4 secs with the
>>>> following plan
>>>>
>>>> PLAN SORT (JOIN (KW INDEX (KEYWORD_WORD,KEYWORD_WORD),BKW INDEX
>>>> (BOOKKEYWORD_PK)))
>>>> PLAN (BK NATURAL)
>>>>
>>>> The problem appears to be the index on ISBN13 on Book isn't being used.
>>>> If I create a view based on the select used in the 'IN' part of the
>>>> above and then join this view to book the performance is ms and all
>>>> indexes are used.
>>>>
>>>> I have considered using UNION but can't find Union clause that only
>>>> returns the intersection of two (or more) SQLs.
>>>>
>>>> I would greatly appreciate any suggestions on how to improve this.
>>>> Ironically it is currently faster to create a temporary view, join of
>>>> this, display results and then delete the view !
>>>>
>>>> We are expecting the data to be significantly larger in the future.
>>>>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Visit http://www.firebirdsql.org and click the Resources item
> on the main (top) menu. Try Knowledgebase and FAQ links !
>
> Also search the knowledgebases at http://www.ibphoenix.com
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Yahoo! Groups Links
>
>
>
>
>
>
>