Subject Re: [firebird-support] Re: Subqueries and indexes
Author Svein Erling Tysvaer
yoav112003 wrote:
> Shouldn't the 'WHERE events.ROWID IN ...' clause be a hint to use the
> index on this field?

It is, but which place in the index should it look at? Using IN
(subselect) means that Firebird goes through EVERY row of EVENTS,
calculates the subselect and sees if the result matches ROWID.

events.ROWID IN (5, 7, 9) can just look up 5, 7, and 9 in the rowid
index, Firebird doesn't know how to do the same with IN (subselect). In
general, subselects are executed for every potential row - even though
it in some cases could have been executed only once. Think of the case

SELECT distinct ROWID FROM events
WHERE events.ROWID IN (
SELECT DESTINATION.EVENTID FROM DESTINATION
WHERE ( (DESTINATION.IP = events.IP))
)

There is no way to use an index when the values aren't fixed in one way
or the other.

An inner join can be a lot better. Another option would be for you to
use a stored procedure with two FOR SELECT loops. First get the
DESTINATION.EVENTID and then events.ROWID.

HTH,
Set

> As you can read in my reply to Milan, I tried to use a custom plan
> with this query which used this index, but I got an error message.
>
> The query suggested by Milan and Helen have the same performance as
> the my query, my guess is because they too require a full table scan
> on the events table.
>
> It seems that the best solution is to use an INNER JOIN query if
> possible, but I am still curious to know the reason the ROWID index
> cannot be used.
>
> --- In firebird-support@yahoogroups.com, Svein Erling Tysvaer
> <svein.erling.tysvaer@...> wrote:
>> Helen and Milan helped you get rid of the irrelevant IDX_DESTINATION2
>> index, but didn't answer your WHY question.
>>
>> Let's pretend we're 'optimizers' who should find a plan for the main
>> part of your query.
>>
>> First, how do we limit the rows? Ah, EVENTS.ROWID does that and that
>> field has an index! What should it be? Hmm, it should have one of the
>> values in a subselect... Well, I'm not clever enough to know if the
>> subselect is constant or varies for each row, hence I cannot use an
>> index for that requirement. No further criteria for EVENTS? OK, I'll go
>> NATURAL then. (I skipped the bit with why it doesn't use the index for
>> the DISTINCT - I think the reason for the subselect, but not the main
>> select using an INDEX for the DISTINCT, is that the subselect can
> create
>> a bitmap of the EVENTID and IP indexes, whereas combining NATURAL with
>> any index is nonsense.)
>>
>> Now, I don't know how clever the optimizer has to be to meet the
> 'clever
>> enough' criteria. Clearly, in your case it could find the
>> DESTINATION.EVENTID "subresult set" only once, but (sub)queries can be
>> much more complicated and Firebird has to handle every possible
>> situation. Currently it doesn't recognize whether subselects are
>> completely independent from the main select (and could be converted
> into
>> something like a join), I have no idea of whether it will be able to do
>> this in the future.
>>
>> Set
>> -friend of the optimizer
>>
>> yoav112003 wrote:
>>> I am trying to run the following query (FB 2 RC3):
>>>
>>> SELECT distinct ROWID FROM events
>>> WHERE events.ROWID IN (
>>>
>>> SELECT DISTINCT DESTINATION.EVENTID FROM DESTINATION
>>> WHERE ( (DESTINATION.IP ='158.125.73.220'))
>>> )
>>>
>>> The default plan that is used is:
>>> PLAN SORT ((DESTINATION INDEX (IDX_DESTINATION2, IDX_DESTINATION1)))
>>> PLAN SORT ((SEAM_EVENT NATURAL))
>>>
>>> where IDX_DESTINATION2, IDX_DESTINATION1 are indexes on
>>> DESTINATION.EVENTID and DESTINATION.IP respectively.
>>>
>>> This plan is very ineffecient since the plan does not rely on the
>>> index for events.ROWID. When I try to use a customized plan with this
>>> index, FB replies that the index cannot be used in this query.
>>>
>>> Is there anything else that can be done (except for using INNER JOIN)?
>>> Why can't FB use the indexes on the events table?
>>>
>>> Thanks.