firebird-architect - Re: [IB-Architect] Firebird Query Speed improvement

Subject	Re: [IB-Architect] Firebird Query Speed improvement
Author	Jim Starkey
Post date	2001-10-31T18:11:06Z

At 01:56 AM 10/30/01 +0100, Ivan Prenosil wrote:

>> From: Jim Starkey
>> The whole concept of plans is a crock -- a wart on a wart,
>> an abomination, a crime against software, an feable excuse
>> of a feature to mask a bug in the optimizer.
>>
>
>> How would anyone feel about a C compiler that required the
>> programmer to tell it how to optimize loops?
>
>Jim, you must be joking when you mention C here!
>The most distinctive feature of C is its pointer arithmetic,
>which is nothing more than the means of hand-optimizing
>(how much C programmers choose accessing "arrays" by index
>instead of direct pointer to its elements ?).
>Similarly, what is the purpose of "register" or "inline" in C
>other than to allow hand-optimization ?

OK, let's look at this. The "inline" case is control over
the space vs. speed tradeoff, which is obviously beyond the
scope of a compiler. The "register" declaration is a pretty
good analogy, however. Although it was marginally useful
for early primitive C compilers (pcc comes to mind), it is
ignored for all but error checking in modern C/C++ compilers
for very good reasons, in specific, that register allocation
is the essence of code generation for modern micro-processors,
and even made compilers have more information than the smartest
humans. When the compiler is worrying about blocking interactions
between superscalar instructions streams on parallel arithmetic
units, advice about frequently referenced variables is rather
pointless.

The Interbase optimizer is more like a compiler code generator
than an "inline" declaration (though the tradeoff of time to
first record for overall efficiency is a valid counterexample).
The optimizer has available to it the definitions of all indexes,
a good guess at table cardinalities, and a very good notion of
the internal costs of various operations. It has a mediocre
knowledge of index selectivities (other than unique indexes,
of course) and no knowledge at all of selectivities of non-
indexed conjuncts. The Interbase optimizer, unlike most
none-JRD systems, doesn't need to worry about index selection,
which is generally the hardest problem.

The Interbase works in two distinct phases. The first determines
join order, the second determines detailed strategy for execution.
The join order determination is based by determining costs of
the various permutations by modelling the second phase for
a particular order (a tree pruning algorithm is used to avoid
a full combinatorial expansion). For each table in candidate
and estimate is made consisting of the estimated the number
times that step will be evaluated, the estimated cost to do
the retrieval from the stream. It also estimates the number
of records coming out of the step which is passed to the next
table in to order for its calculation.

The join order calculation can screw up if it guesses wrong
about cardinalities, the distribution of values is heavily
skewed, or the estimated cardinalities are wrong (precise
count is too expensive to maintain). It can also screw up
if his model of the second phase is incorrect (this is what
we call a "bug").

The second phase of optimization is intrinsically simple.
It loops through the tables in final join order finding any
and all indexes that apply. It would be better (and simple
to implement) if it identified redundant indexes and used only
the more specific, but that's another topic.

I strongly suspect that the optimizer bug is either a straightforward
screwup in the second phase, a trick added to the second phase
that didn't make it into the first phase model of the second phase,
or, most likely, a stupid piece of code in the second phase that
says "hey, I recognize this special case and I'll doing something
that seem very clever at the time it was written."

I'm about 98% convinced that the problem is an out and out bug rather
than an calculation based on unreliable information (this possibility
exists, however).

The fundamental problem with the plan crock, however, is that it
masks the bug. Folklore says the optimizer is stupid, override it
any everything will be ok. So the cognescenti use plans and the
bug isn't addressed. Folks evaluating Firebird, especially ones
that understand the technology, take a sniff, smell something
very, very rotten, and go someplace else (they're welcome at
Netfrastructure if they have enough money).

Firebird will never be considered a serious product by knowledgable
developers as long as it is dependent on plans. Those people are
not on the Firebird or Interbase lists because they are using
Oracle or SqlServer or Sybase.

If you want Firebird to succeed, it must appear credible to
folks evaluating databases. Plans are not credible.

Fix the damn bug, please.

Jim Starkey