Subject RE: [firebird-support] Major query performance problems
Author Svein Erling Tysvær
>>> First thing I would try, _especially_ since you use inner joins, would
>>> be changing the order of the tables: Instead of
>>> SELECT a.id, b.id FROM a JOIN b WHERE ...
>>> try
>>> SELECT a.id, b.id FROM b JOIN a WHERE ...
>>
>> The order of tables in an inner join is not significant to the
>> Firebird optimizer.
>
>That may be so in theory, but in practice it _is_ significant.

Are you sure, Pepak? My experience tells me (albeit I'm mostly on Firebird 1.5) that as long as all joins in a query are inner joins (add one left join anywhere except at the end of the query and things are different) that the optimizer places the tables in the order it finds most suitable in the PLAN regardless of which table is first in the query.

>>> Second thing, it's been my experience that inner joins tend to be much
>>> slower than outer joins with Firebird (probably a problem with my
>>> optimizations, but the end result is that I am getting much better
>>> performance from SELECT a.id, b.id FROM a LEFT JOIN b WHERE ... than
>>> from SELECT a.id, b.id FROM a JOIN b WHERE ...
>>
>> You really ought to look at the plans and the index statistics.
>> What your seeing is not normal.
>
>Again, it may not be normal in theory, but it is common in practice.
>Sure, plans are the problem, but every time I checked them, they seemed
>quite reasonable.

LEFT JOIN gives the optimizer less choice as to which plan it can choose and can be used as an alternative to +0 or ||'' when doing 'manual optimization' (though take care, the result set may be different if the right side table isn't mentioned in the WHERE clause). Preventing the optimizer from using certain indexes (which is a result of using LEFT JOIN) can be useful, but normally one ought to first try [INNER] JOIN and only change to LEFT [OUTER] JOIN if the plan looks suboptimal. Sure, the optimizer sometimes makes mistakes, but recommending people to generally use LEFT JOIN rather than [INNER] JOIN will result in 'random performance' rather than 'good performance most of the time'. Also, I expect the optimizer to make much better plans once we get histograms (Firebird 3), and I fear that LEFT JOIN (or +0 or ||'') may mean that you miss that improvement (i.e. that Firebird will not be able to use the ideal plan).

>> My recollection is that an outer join with conditions on both
>> tables is turned into an inner join.
>
>Again, maybe. But it still tends to improve the performance a lot.

Here, I think what Ann is saying is partially wrong (the mother of InterBase is among the very few I've never ever seen write anything incorrect before, but at least my experience with Fb 1.5 differs slightly from her statement above). She is right in that the result set in the actual query in such cases will be identical regardless of type of join (except where checking for NULL or using OR rather than AND), but LEFT JOIN prevents the optimizer from putting the right table before the left table in the plan, so it can have an effect on execution time (for better or for worse).

(adding a bit from Anns other reply)
>OK, as Svein said, you've got a field with two values ... and probably a
>preponderance of one over the other. Firebird handles that case badly.
>The work around, as he said, is to make the index look unusable
>if you're selecting on the value that is vastly more common.

Did I say that? OK, maybe I did. However, from the timing Alec reports, I think that PC.IS_MARKED_DELETED = 1 is pretty selective and that that index ought to be used and that avoiding the index only ought to be done if matching PC.IS_MARKED_DELETED against the other value.

> That's the same plan as the first query, and yet the results are very different... Odd indeed.

Oh, 30 mins vs 23 seconds - I never noticed that - I thought the difference was only 7 seconds. Has the timing been verified by running each query several times and where there lots of other things going on in the database while the queries were run? Are PHYSICAL_COPY and COPY tables or views? That kind of difference surely puzzles us... Though the prepare time for the first two queries also seem terrible, why would you need 1-3 seconds to prepare statements involving two tables (assuming they're not complex views) with a simple join and one or two fields referenced in the where clause? The last 0.094 prepare time sounds a lot more sensible than 1.203 or 2.625.

Set