Subject Re: [ib-support] Can this VIEW's be optimized?
Author Claudio Valderrama C.
"Juergen Bachsteffel" <juergen@...> wrote in message
news:5.1.0.14.0.20011214134010.00a77570@...-lokal.com...
> Hi!
>
> An external programmer team developed a application for a company I
consult
> now.

> They claim that the views cannot be optimized anymore

Generally speaking, you have workarounds.

Did you ask if they don't want to waste more time?
:-)

> and the optimation of
> Interbase is not so good.

This is painfully true, but it doesn't necessarily mean that you are doomed.


> The company should change to MS-SQLSERVER. I am not convinced, because the
> VIEWS look very complicated for me and are also nested.

If you run a turtle based on another turtle, the result can't be brilliant,
anyway.


> I am no SQL-Expert, but ....

Well, become one in the road, Juergen, because I can't explain you some
issues if you don't know the basics of SQL.


> Environment:
> Server IB4.2 (Windows)

It's still capable or delivering better performance.

> Clients: Delphi5 (using BDE)
> ( I know that BDE was not the best choice, but it was before my time)

It won't hurt you in this case and wisely used, it can be acceptable
(specially if the app does a lot of in stored procedures).


> I tested it by my own with a Firebird 0.9.4-Server. The VIEWS are also
very
> slow with the real production data.)

No surprise. Probably the code that does outer joins hasn't changed in the
latest years.


> I can extract also the whole data/metadata (Tables etc.) of the database
if
> needed.

At first glance, it's not necessary. Anyway, go with Aage questions because
there may be a missing index.

> FROM
> Artikel LEFT JOIN V_SUB_JC_ZEICHNUNGSNR ZNR ON Artikel.ID =
ZNR.Artikel_ID
> LEFT JOIN Kaufteile ON Artikel.ID = Kaufteile.ID
> LEFT JOIN Teile On Artikel.ID = Teile.ID
> LEFT JOIN ArtikelTypen ON Artikel.Typ_ID = ArtikelTypen.ID
> LEFT JOIN ArtikelEinheiten ON Artikel.Einheit_ID = ArtikelEinheiten.ID
> LEFT JOIN Gesamtbestaende ON Artikel.ID = Gesamtbestaende.Artikel_ID;

Do you know really if this is need? Do you know the structure of the db?
I've found (yes, I had to trobleshoot other people's misunderstandings, too)
that in most cases, the only needed outer join is the first, but it depends
on the data. Create a second view but replace the above part with

FROM
> Artikel LEFT JOIN
( V_SUB_JC_ZEICHNUNGSNR ZNR
JOIN Kaufteile ON Artikel.ID = Kaufteile.ID
JOIN Teile On Artikel.ID = Teile.ID
JOIN ArtikelTypen ON Artikel.Typ_ID = ArtikelTypen.ID
JOIN ArtikelEinheiten ON Artikel.Einheit_ID = ArtikelEinheiten.ID
JOIN Gesamtbestaende ON Artikel.ID = Gesamtbestaende.Artikel_ID
)
ON Artikel.ID = ZNR.Artikel_ID;

The reason is the following: most of the times, people want to ensure that
the first table gets all records no matter the matching record in the right
operand of the JOIN is absent or present. And probably, the other cases
don't need an outer join at all. Left joins aren't optimized as they should.
You'll notice I put parenthesis and made simple (inner) joins in the rest of
the cases. If they are using keyed fields, the indices will be used to
produce a fast join and then the result will be sent to a sort operation,
probably. Another question is if you really need all the info in your first
screen. Does the end user care to read 3 million records in a grid, for
example? Wouldn't it be preferable to put a couple of WHERE conditions and
limit the number of returned rows?

Again, in your second view:

FROM
Artikel LEFT JOIN
( V_SUB_JC_ZEICHNUNGSNR ZNR
JOIN Kaufteile ON Artikel.ID = Kaufteile.ID
JOIN Teile On Artikel.ID = Teile.ID AND Teile.NEUAUFLAGE = 1
JOIN ArtikelTypen ON Artikel.Typ_ID = ArtikelTypen.ID
JOIN ArtikelEinheiten ON Artikel.Einheit_ID = ArtikelEinheiten.ID
JOIN Gesamtbestaende ON Artikel.ID = Gesamtbestaende.Artikel_ID
)
ON Artikel.ID = ZNR.Artikel_ID;

Now, there's the remote possibility that for some weird reason all left
joins are needed. In this case, you're contemplating a bad design. I've hit
few cases that may justify more than a couple of nested LEFT JOINS. One
problem is that people find that

t1 left join t2 on <cond1> join t3 on <cond2> join t4 on <cond3>
doesn't work because the inner join wins finally over the left join (so the
left join has no effect) then they try
t1 left join t2 on <cond1> LEFT join t3 on <cond2> LEFT join t4 on <cond3>
that seems to do the trick, hence they say "can't go optimizing further"
and those people usually don't try to write instead
t1 left join (t2 join t3 on <cond2> join t4 on <cond3>) on <cond1>
that's what I tried to do with your views and that will hopefully spare the
server a couple of hard effort.

Here's an example:

SQL> create database 'join';
SQL> create table a(a int not null primary key);
SQL> create table b(b int not null primary key);
SQL> create table c(c int not null primary key);
SQL> create table d(d int not null primary key);
SQL> set plan;
SQL> select * from a join b on a=b join c on b=c join d on c=d;

PLAN JOIN (A NATURAL,D INDEX (RDB$PRIMARY4),C INDEX (RDB$PRIMARY3),B INDEX
(RDB$PRIMARY2))
SQL> select * from a left join b on a=b left join c on b=c left join d on
c=d;

PLAN JOIN (JOIN (JOIN (A NATURAL,B INDEX (RDB$PRIMARY2)),C INDEX
(RDB$PRIMARY3)),D INDEX (RDB$PRIMA
Y4))
SQL>

Not much. The sequence has been reordered. This is probably because the
engine knows that PKs don't accept null values (IB/FB creates indices
automagically for PK, Unique and FK constraints). However, with non-unique
indices (PKs that accept NULLs, for example), data and once the db was
backed up and restored at least one time, selectivity has been calculated in
the indices and then the engine doesn't react the same than with no data to
complex statements.

If the worst happen and you found such a design that really needs all LEFT
JOINs, then you should:

a) Convert the chain of left joins into a select on the first table,
followed by nested statements. However, in your case, since all the
conditions don't refer to the first table but related the prior and the next
table, it can't be done.

b) Convert the view into a selectable stored procedure and handle the third,
fourth and Nth table with separate singleton SELECTs inside the main FOR DO
loop. Then you select from the proc as it if was a simple table. This would
be useful if you know that the tables in the LEFT JOIN part can contribute
zero or one record, not more records. This is the body of an example proc:

set term ^;
create procedure p(input params if needed)
returns (f1 smallint, f2 char(10), f3 numeric(9,4))
as begin
FOR select field from tbl1 INTO :f1 DO
begin
f2 = null; f3 = null;
/* I clean those output params because if the singleton select returns
zero records, f2 & f3 aren't assigned and if they had values from the
previous iteration of the outer FOR, you would return wrong data instead
of NULLs. This is the special care when emulating left or right joins. */
select t2.field2, t3.field3 from
t2 join t3 on t2.f = t3.f
where t2.key = :f1
into :f2, :f3;
end
end ^
set term ;^


That's your third view:

> CREATE VIEW V_SUB_JC_ZEICHNUNGSNR (
> ARTIKEL_ID,
> ZEICHNUNGSNUMMER)
> AS SELECT
> ArtikelArtikelZusatzInfos.Artikel_ID,
> Zeichnungen.Zeichnungsnummer
> FROM
> ArtikelArtikelZusatzInfos,
> Zeichnungen
> WHERE
> ArtikelArtikelZusatzInfos.ArtikelZusatzInfo_ID = Zeichnungen.ID;

I was going to suggest that you change it to an explicit JOIN syntax:

FROM
ArtikelArtikelZusatzInfos JOIN
Zeichnungen
ON ArtikelArtikelZusatzInfos.ArtikelZusatzInfo_ID = Zeichnungen.ID;

but don't do that, since I remembered that IB4.2 was more optimized to work
with the old deprecated & implicit inner join than with the new explicit
syntax. Leave it as it's for now.

I have a hunch why this third view was created: devs didn't figure out a way
to merge that INNER JOIN statement into the LEFT JOIN mess shown in the
views that call it. Once you test my change, you may try to find if merging
this view's statement directly in the other two instead of executing it
separately produces some change. Now, you know how to isolate different
types of JOINs with a couple of parenthesis, so writing the third's view SQL
directly into the other views should be possible, if you observe that it
gives better performance.

C.
--
Claudio Valderrama C. - http://www.cvalde.com - http://www.firebirdSql.org
Independent developer
Owner of the Interbase® WebRing