Subject Re: [firebird-support] Doubt with optimization in Query
Author Ann Harrison
2011/9/13 Svein Erling Tysvær <svein.erling.tysvaer@...>

>
>
> Firebird doesn't have histograms yet, so it knows nothing about which field
> values are common and which are rare when it chooses between using the index
> of a probably reasonably selective field (I assume you have lots of records
> in maestro_turnos and that 300K is only a small fraction)
> ...
> In the future, I think Firebird will implement histograms (maybe it is
> already part of Firebird 3?), and then I guess it will make the right
> decision in cases like yours.
>

In case there's someone out there wondering what histograms are in this
context and what they have to do with optimization, let me elaborate a bit.
Firebird keeps the selectivity of indexes and in the case of compound
indexes, the selectivity of each field in the key. The selectivity is the
number of distinct values compared to the total number of entries in the
index. So, if you've got a table with 100,000 records and one of the
fields has five distinct values, an index on that field would have a
terrible selectivity. On the other hand, if 999,900 records had one value
for the field and the other four values were distributed among the remaining
100 records, then for those four values, the index would have good
selectivity. A histogram records the distribution of values in an index,
so it's a much more sensitive way of determining the usefulness of an index,
providing you know the values you're looking for.

Different databases optimize queries at different times. Firebird optimizes
its queries when they are presented so for joins and prepared queries with
parameters, it doesn't know all the input values. Some other systems delay
optimization until all parameters are known, so the inner part of a join is
optimized for each new value from the outer part of the join and a
parameterized query is optimized for each new value of the parameters.
Systems like that can make great use of histograms, but at a significant
cost for re-optimization.

If I understood the query presented, it was looking for records in a
specific range of dates. Range queries are particularly awkward to optimize
based on selectivity. And the dates were explicit. So for that particular
query, histograms would be a great help.


Good luck,

Ann


[Non-text portions of this message have been removed]