Subject | RE: [IB-Architect] Disk Bandwidth was License Question |
---|---|
Author | Jim Starkey |
Post date | 2000-03-26T17:53:55Z |
At 12:41 AM 3/26/00 -0400, Claudio Valderrama C. wrote:
eye to eye. I'll start with the big picture then bad mouth plans.
The Interbase optimizer has three general phases: join order selection,
index join generation (streams), and joining of streams
into rivers by sort/merge.
Join order selection is cost based with the optimizer make a crude
guess of the number of pages necessary to get to a record. This
involves guesses of the cardinality of the various tables and
selectivity on indexes, none of which is known with any accuracy.
In theory this is simple: consider all possible join orders and
pick the cheapest. And this is what V1 did. Worked pretty well.
Then somebody threw a 17 way join. It still worked pretty well --
the resultant query executed in less than 10 seconds. Unfortuantely,
the optimization phase took 14 hours.
So the optimizer got a lot smarter. It learned to alway take
unique indexes ahead of non-unique and non-indexed and at
every stage to compute the best case for remaining subtree to
compare against the current best solution. After a couple of
days of reasonably hard work the 14 hour optimization came down
to about 45 seconds with the same answer.
As an aside, the Interbase optimizer has much less work to do
than most (excluding, of course, Rdb/Eln and Netfrastructure).
Traditional relational database systems need to worry about
index selection as well. Interbase is perfectly happy, yea
eager, to and/or bitmaps from multiple index scans before
fetching records. The more the merrier.
Once the join order has been established, the optimizer loops
through the streams looking for an index or indexes to get
to the next stream. If it can't find one, it makes the
previous sequence of streams into a tributary. The next
phase combines the tributaries into rivers by sort/merge
joins, recursively, until one big fat river results. Voila!
(in fact, sort/merge joins are rare, usually mother nature
telling you that your database design stinks, and the
existence of marketting department reminding you that,
well, never mind).
The output of the optimizer is a linked tree of objects
call RSBs (Record Source Blocks), each of which doesn't
something simple well. Among the RSBs are exhaustive
retrieval, indexed retrieval, boolean sieve, cross
product, sort merge, and sort. Each RSB has three
entry points, Open, Fetch, and Close, and each blindly
calls other RSBs to free them a stream. In other
words, the runtime requires (and has) no intelligence.
The optimizer is supposed to be smart and the execution
dumb and fast.
<flame>
A cost based optimizer is an attractive nuisance. Many
people think in terms of rule based optimizers (which don't
scale with complexity), and can't resist the temptation
to override a cost based optimizer with the odd rule there
and there to get around a glitch. This usually (read
"always") breaks the cost base scheme with really embarrassing
pessimizations. Rather than go back and repair the
architectural damage, the temptation is to build in a
mechanism of "plans" to report and/or override the optimizer.
A plan, in essense, is a bug on a bug. So the failure to
backup a plan required by a database procedure in gbak
is a bug on a bug on a bug. Plans are definitely in the
category of things database developers invent to make
poor performance somebody else fault.
</flame>
Plans are now part of Interbase and therefor can't go
away. I hope that the need for plans goes away in the
very near future.
Jim Starkey
>From: "Claudio Valderrama C." <cvalde@...>Had I my druthers I duck this one; the current regime and I don't see
>
> Speaking about optimizations and indexes, Jim, can you write a bit about
>indexes and the query optimizer, please? There has been some controversy on
>how Interbase takes into account indexes to generate optimum query plans.
eye to eye. I'll start with the big picture then bad mouth plans.
The Interbase optimizer has three general phases: join order selection,
index join generation (streams), and joining of streams
into rivers by sort/merge.
Join order selection is cost based with the optimizer make a crude
guess of the number of pages necessary to get to a record. This
involves guesses of the cardinality of the various tables and
selectivity on indexes, none of which is known with any accuracy.
In theory this is simple: consider all possible join orders and
pick the cheapest. And this is what V1 did. Worked pretty well.
Then somebody threw a 17 way join. It still worked pretty well --
the resultant query executed in less than 10 seconds. Unfortuantely,
the optimization phase took 14 hours.
So the optimizer got a lot smarter. It learned to alway take
unique indexes ahead of non-unique and non-indexed and at
every stage to compute the best case for remaining subtree to
compare against the current best solution. After a couple of
days of reasonably hard work the 14 hour optimization came down
to about 45 seconds with the same answer.
As an aside, the Interbase optimizer has much less work to do
than most (excluding, of course, Rdb/Eln and Netfrastructure).
Traditional relational database systems need to worry about
index selection as well. Interbase is perfectly happy, yea
eager, to and/or bitmaps from multiple index scans before
fetching records. The more the merrier.
Once the join order has been established, the optimizer loops
through the streams looking for an index or indexes to get
to the next stream. If it can't find one, it makes the
previous sequence of streams into a tributary. The next
phase combines the tributaries into rivers by sort/merge
joins, recursively, until one big fat river results. Voila!
(in fact, sort/merge joins are rare, usually mother nature
telling you that your database design stinks, and the
existence of marketting department reminding you that,
well, never mind).
The output of the optimizer is a linked tree of objects
call RSBs (Record Source Blocks), each of which doesn't
something simple well. Among the RSBs are exhaustive
retrieval, indexed retrieval, boolean sieve, cross
product, sort merge, and sort. Each RSB has three
entry points, Open, Fetch, and Close, and each blindly
calls other RSBs to free them a stream. In other
words, the runtime requires (and has) no intelligence.
The optimizer is supposed to be smart and the execution
dumb and fast.
<flame>
A cost based optimizer is an attractive nuisance. Many
people think in terms of rule based optimizers (which don't
scale with complexity), and can't resist the temptation
to override a cost based optimizer with the odd rule there
and there to get around a glitch. This usually (read
"always") breaks the cost base scheme with really embarrassing
pessimizations. Rather than go back and repair the
architectural damage, the temptation is to build in a
mechanism of "plans" to report and/or override the optimizer.
A plan, in essense, is a bug on a bug. So the failure to
backup a plan required by a database procedure in gbak
is a bug on a bug on a bug. Plans are definitely in the
category of things database developers invent to make
poor performance somebody else fault.
</flame>
Plans are now part of Interbase and therefor can't go
away. I hope that the need for plans goes away in the
very near future.
Jim Starkey