Subject Re: [firebird-support] Determining Join Order
Author Svein Erling Tysvaer
Hi Tom!

My knowledge of selects are based on practical experience with Fb 1.5,
so my answer will not answer your questions!

Let's take the simple case first - scenario 2. The subselect has to be
executed for each potential row and I think of this as 'first get some
information about which rows of 'document' that has docid > 7000, then
run the subselect for each of these rows. Normally, I find that such
queries execute quickly in Firebird, though of course it can be slow if
you have millions of documents which contains lots of words each.

In scenario 1, the optimizer has a lot more freedom as to which plan to
choose. It may choose to start with dw1 or dw2 (which to me seems like
virtually identically good options), but it may also decide to start
with going NATURAL on d, something that may or may not be a problem (if
it is a problem, it should at least be simple to tweak the plan by using
dw1.docid+0).

Another potential problem (in both scenarios), is that the optimizer may
choose to use two indexes on dw1 or dw2 - something that may be OK or
extremely time consuming depending on the selectivity of docid and
wordid. It should use an index for the one of these with the best
selectivity, and could use both if the second index has a similar
selectivity. However, I have several times experienced that Firebird 1.5
choose to use all available indexes, where one of the indexes is a
virtually unique index whereas the other index(es) have very poor
selectivity (and there are more than a million rows in these tables).
Whenever I open such queries without preventing the poor selectivity
index from being used, I get lots of time pondering: 'How could I be
this stupid, yet again?'.

As for the quote you found, I'd say that it is true for the optimizer -
it better be good at finding us the optimal plan to avoid us ditching
Firebird! However, for me as a Firebird user, I don't care about streams
at all, I just trust the optimizer to take care of that. On those
relatively rare occasions where the optimizer fails to come up with a
good plan I try to help it by tweaking my query, but normally I just
take a quick look at the suggested plan and happily accept what the
optimizer suggests.

HTH,
Set

Tom Conlon wrote:
> http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_quep
>
> '...In joining two or more streams together, it is often crucial to
> determine which stream should be retrieved first. In general, the
> stream which is the most expensive to retrieve should be retrieved
> first. If you think of a join as a set of nested loops, it makes sense
> that the innermost loop will be executed the most times. Similarly,
> the last stream in the join list will be fetched the most times, so it
> had better be the cheapest to retrieve.'
>
> Is this still the case (FB 1.5 & 2.0)?
>
>> the last stream in the join list will be fetched the most times
>
> Can someone confirm what the 'last stream' in the following 2 cases:
>
> 1.
> SELECT distinct d.docid
> FROM document d
> JOIN documentwords dw1 ON (d.docid=dw1.docid)
> JOIN documentwords dw2 ON (dw2.docid=dw1.docid) AND (dw2.WORDID=338)
> WHERE (dw1.wordid=301) <-- does this exec first?
>
> 2.
> SELECT distinct d.docid
> FROM document d
> WHERE d.docid>7000 and <-- does this exec first?
> EXISTS
> (
> SELECT 1 from documentwords dw1
> WHERE (dw1.docid=d.docid) AND (dw1.WORDID=338)
> )
>
> Thanks,
> Tom