Subject RE: [firebird-support] Plan question
Author Svein Erling Tysvær
>> Which Firebird version are you using?
>I'm currently using version 2.1.4.18270.

OK, I think that means the optimizer knows about the selectivity of parts of the indexes, not only the index as a whole.

>I think that NATURAL will be never used if there is any possible index
>to use doesn't care about the selectivity of that index. I think
>selectivity is taken into consideration only when more than 1 index
>matches needs for query.

Possibly (I don't know, I normally don't index Boolean columns), but you have two possible indexes to choose from.

>> 2) The optimizer may not know that using an index on SRC_DIR_NAME, ID_CSYMBOL is >>equally good or better than an index using simply SRC_DIR_NAME (thinking about it, it
>>is obvious, but if all you're doing is choosing between two indexes with a selectivity
>>of 0.0000007 and 0.5, then you of course prefer the one with 0.0000007). If this is >>the case, you can try either:

>And thats strange IMHO. There is an index where only a part (column
>SRC_DIR_NAME) can be used and there is another index, that contains all
>columns, which are used in where condition.

Well, I was assuming Firebird 1.5, with 2.1 the optimizer knows about selectivity of parts of the index.

>> a) Don't prepare until you have to (possibly even updating the statistics shortly
>> before the prepare, I don't know whether that can help or not), hoping a better plan
>> will be chosen when you have final data

>That means I have to run SET STATISTIC query for this table after
>previous step. And I don't like this idea. In my mind this type of query
>belongs to database maintenance and not to standard client application.

In general I would agree, that comment was because I thought your program might have done massive insert/update/delete between preparing the query and actually selecting the records. And massive modifications to the database affects the best plan for the optimizer to choose.

>> b) Add NODE_ID to IX_ZLEVELS_DIR_SYM to increase selectivity hoping that would
>> improve the plan

>Yes - it will probably help, but I don't want to waste disk space and
>computing performace. This table will have like hundreds millions of
>records. Whole database will have like 200GBs or more when it will be
>filled with data

It could have been an idea in Firebird 1.5, but with 2.1 you have no reason to add a field to that index.

>> c) Remove NODE_ID from IX_ZLEVELS_DIR_NODE hoping that would improve the plan (though
>> this could slow down UPDATEs and DELETEs)

>This will slow down performace of some other steps significantly so I
>will not go for it.

Same as above, not important for Fb 2.1.

>> Other things you should consider, is the selectivity of IX_STREETS_LINK_ALT and
>> FK_STREETS_CSYMBOL, if FK_STREETS_CSYMBOL has considerably worse selectivity that
>> IX_STREETS_LINK_ALT you may consider preventing FK_STREETS_CSYMBOL from being used.

>I don't understand this. Both indexes are used and it's good as I know.
>It's much better to use that foreign key index instead of no index.

Sometimes Firebird (at least 1.5, I don't use 2.1 much myself, but would expect this problem to have become less of a problem rather than completely eliminated) decides to use one selective and one unselective index and the unselective index could slow things down. I quite frequently experience this at work when the optimizer thinks that an index on a PID number is not enough by itself.

>> Another thing you also could consider, is whether it is best to start with SRC or STR >> first in the PLAN. It might be that FROM STREETS STR LEFT JOIN ZLEVELS SRC is a
>> better choice (but that depends on your data, the optimizer thinks that it is best to
>> start with SRC, but it has no knowledge about the selectivity of the values 1360 vs
>> 2000, 2010 and C1, C4).

>ZLEVELS - over 7 millions of records, STREETS - almost 3 millions. In
>table ZLEVELS are only 2 distinct values of SRC_DIR_NAME - 'C1' and 'C4'
>(at this time), around 40000 records with ID_CSYMBOL IN (2000, 2010).

40000 records isn't much, I expect there to be considerably more with STR.ID_CSYMBOL = 1360? How selective is the JOIN clause (primarily I'm interested in LINK_ID of both tables, I assume ALTSTREETS = 'F' to have poor selectivity)?

Set