firebird-support - Re: [firebird-support] Optimization Questions (loooong)

Subject	Re: [firebird-support] Optimization Questions (loooong)
Author	Arno Brinkman
Post date	2004-05-25T20:03:14Z

Hi,

> >The thing i notice from your PLAN is that for table Empresa_CredorDevedor
> >ECD two indexes (SK_EMPRESA_CREDORDEVEDOR_NOME,RDB$PRIMARY26) are used.
> >Becuase this is at the 6th inner join this could really slow down things.
> >
> Don't think so....
>
> SK_EMPRESA_CREDORDEVEDORNOME is a good index. take a look at the

statistics:

Okay, but it uses also RDB$PRIMARY26 and will combine the result of both
indexes. This is not necessary slow, but it was a suggestion from me.

> RDB$INDEX_NAME RDB$STATISTICS
> ================================ ========
> RDB$PRIMARY292 0.1428571
> RDB$PRIMARY357 1.0000000
> RDB$FOREIGN305 0.0000362
> RDB$PRIMARY296 0.0000334
> RDB$FOREIGN298 0.0007955
> RDB$PRIMARY295 0.0769230
> RDB$PRIMARY294 0.1250000
> RDB$PRIMARY41 0.5000000
> RDB$PRIMARY26 0.0004235
> RDB$PRIMARY17 0.0052083
> RDB$PRIMARY16 0.0714285
> RDB$PRIMARY5 0.0294117
> RDB$PRIMARY4 0.0055865
> RDB$PRIMARY1 0.1000000
> SK_EMPRESA_CREDORDEVEDOR_NOME 0.0004249
>
> Another thing, on the query that the execution is fast, table
> Empresa_CredorDevedor ecd uses the same 2 indices, the diferrence is
> that the optimizer put it in the first join, where I think it should
> stay on every query. The index SK_Empresa_CredorDevedor_Nome in on
> "Name" field, that are "almost" unique. I have 8 duplicated names in
> 2361 records.

<snip>

> >>
> >>PLAN (BORDERO INDEX (RDB$PRIMARY5))
> >>PLAN JOIN (ECD INDEX (SK_EMPRESA_CREDORDEVEDOR_NOME),TR INDEX
> >>(RDB$FOREIGN298),ME INDEX (RDB$PRIMARY357),TRB INDEX
> >>(RDB$FOREIGN305),TS2 INDEX (RDB$PRIMARY292),TT INDEX (RDB$PRIMARY294),CC
> >>INDEX (RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),TBT INDEX
> >>(RDB$PRIMARY295),CB INDEX (RDB$PRIMARY16),A INDEX (RDB$PRIMARY1),BB
> >>INDEX (RDB$PRIMARY4),TS1 INDEX (RDB$PRIMARY292))
> >>
> >>Returns 79 rows
> >>Execution Time: 15ms
> >>Fetch Time: 13ms
> >
> >The optimizer chooses a query PLAN which uses a index for _every_ table,

> >the query execution starts already with an limited resultset from table
> >Empresa_CredorDevedor and starts joining from that point.
> >
> Yes. This is very good, the SK_EmpresaCredorDevedor_Nome returns only
> one row from 2391, and TituloReceber are linked to Empresa_CredorDevedor,
> There are 81 records on table TituloReceber that are related to table
> Empresa_CredorDevedor with ID 941 (given by the search on Name "CISPLA%"
> - index SK_EmpresaCredorDevedor_Nome), the total records on table
> TituloReceber is 29899, the "Natural on TituloReceber" slows things

down...

>
> What I want is that in every of those samples the join should start on
> Empresa_CredorDevedor using index SK_EmpresaCredorDevedor_Nome, and then
> go on....

I understand, because you know exaclty what you're going to ask.
As i already told the real problem here is that the optimizer doesn't take
into account the LIKE index for the JOIN order.

> >>If I convert the outer join to an inner join (of course will be faster,
> >>but I was not expecting a so huge difference, and don't undersant a plan
> >>so diferent)
> >>
> >>select
> >> trb.TituloReceberBaixaId,
> >> tr.BorderoId,
> >> Bo.Numero
> >>from
> >> TituloReceberBaixa trb join
> >> TituloReceber tr on (trb.TituloReceberId =
> >>tr.TituloReceberId) join
> >> TituloTipo tt on (tr.TituloTipoId =
> >>tt.TituloTipoId) join
> >> Empresa_CredorDevedor ecd on (tr.EmpresaCredorDevedorId =
> >>ecd.Empresa_CredorDevedor_Id) join
> >> ContaContabil cc on (tr.ContaContabilId =
> >>cc.ContaContabilId) join
> >> Moeda m on (tr.MoedaId = m.MoedaId) join
> >> TituloBaixaTipo tbt on (trb.TituloBaixaTipoId =
> >>tbt.TituloBaixaTipoId) join
> >> ContaBancaria cb on (cb.ContaBancariaId =
> >>trb.ContaBancariaId + 0) join
> >> AgenciaBancaria a on (cb.AgenciaBancariaId =
> >>a.AgenciaBancariaId) join
> >> Banco bb on (a.BancoId = bb.BancoId) join
> >> TituloStatus ts1 on (tr.StatusId = ts1.TituloStatusId)
> >>join
> >> TituloStatus ts2 on (trb.StatusId =
> >>ts2.TituloStatusId) join
> >> MultiEmpresa me on (me.MultiEmpresaID =
> >>tr.MultiEmpresaID + 0) join
> >> Bordero bo on (BorderoID = tr.BorderoID)
> >>where
> >> ecd.Nome like 'CISPL%'
> >>
> >>PLAN JOIN (ECD INDEX (SK_EMPRESA_CREDORDEVEDOR_NOME),TR INDEX
> >>(RDB$FOREIGN298),BO INDEX (RDB$PRIMARY5),TRB INDEX (RDB$FOREIGN305),TS2
> >>INDEX (RDB$PRIMARY292),TT INDEX (RDB$PRIMARY294),CC INDEX
> >>(RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),TBT INDEX (RDB$PRIMARY295),CB
> >>INDEX (RDB$PRIMARY16),A INDEX (RDB$PRIMARY1),BB INDEX (RDB$PRIMARY4),TS1
> >>INDEX (RDB$PRIMARY292),ME INDEX (RDB$PRIMARY357))
> >>
> >>Returns 5 rows
> >>Execution Time: 6ms
> >>Fetch Time: 12ms
> >
> >The same as above, a index is used for every table.
> >
> But why the plan could not be similar, if I use an outer join ?

The optimizer can only decide the order for INNER JOINs.
Thus you could change this query to :

select
trb.TituloReceberBaixaId,
tr.BorderoId,
Bo.Numero
from

Empresa_CredorDevedor ecd
JOIN TituloReceber tr ON
(tr.EmpresaCredorDevedorId = ecd.Empresa_CredorDevedor_Id + 0)
LEFT JOIN Bordero bo ON (bo.BorderoID = tr.BorderoID)
JOIN TituloReceberBaixa trb ON (trb.TituloReceberId = tr.TituloReceberId)
JOIN TituloTipo tt ON (tr.TituloTipoId = tt.TituloTipoId)
JOIN ContaContabil cc ON
(tr.ContaContabilId = cc.ContaContabilId)
JOIN Moeda m ON (tr.MoedaId = m.MoedaId)
JOIN TituloBaixaTipo tbt ON (trb.TituloBaixaTipoId =
tbt.TituloBaixaTipoId)
JOIN ContaBancaria cb ON (cb.ContaBancariaId = trb.ContaBancariaId + 0)
JOIN AgenciaBancaria a ON (cb.AgenciaBancariaId = a.AgenciaBancariaId)
JOIN Banco bb ON (a.BancoId = bb.BancoId)
JOIN TituloStatus ts1 ON (tr.StatusId = ts1.TituloStatusId)
JOIN TituloStatus ts2 ON (trb.StatusId = ts2.TituloStatusId)
JOIN MultiEmpresa me ON (me.MultiEmpresaID = tr.MultiEmpresaID + 0)
WHERE
ecd.Nome like 'CISPL%'

Maybe the "tr.MultiEmpresaID + 0" could also be changed to
"tr.MultiEmpresaID".

Then the first three tables join order should be forced to ECD, TR, BO

> The outer join is on the last table (so, should be executed last right ?)
> Why not process every inner join (that you be very fast), and after that
> process the outer join, with very few select rows.
> But a Natural on Table TituloReceber is used instead....

In the examples you post Bordero was as last with a LEFT JOIN and will
certainly be fetched as last table (for every record as result from all
inner joins together).

> >Have you tried moving "ecd.Nome like 'CISPL%'" to the ON clause ?
> >
> No, I had no tried it...
>
> PLAN JOIN (JOIN (ECD INDEX (SK_EMPRESA_CREDORDEVEDOR_NOME),TR INDEX
> (RDB$FOREIGN298),ME INDEX (RDB$PRIMARY357),TRB INDEX
> (RDB$FOREIGN305),TS2 INDEX (RDB$PRIMARY292),TT INDEX (RDB$PRIMARY294),CC
> INDEX (RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),TBT INDEX
> (RDB$PRIMARY295),CB INDEX (RDB$PRIMARY16),A INDEX (RDB$PRIMARY1),BB
> INDEX (RDB$PRIMARY4),TS1 INDEX (RDB$PRIMARY292)),BO INDEX (RDB$PRIMARY5))
>
> Returns 79 rows
> Execution Time: 16ms
> Fetch Time: 20ms
> (but know I have some other tasks running that could make this query a bit

slow...)

>
>
> This is the EXACTLY plan I was expecting.... :-)
>
> But I could not use this query.
> The select I show are views, and I cannot hardcode something on the on
> clause on the view, because the views could be executed in diferent
> forms, wih direferent search criteria :-(

Okay, that isn't going to work.

> >Did you compare with fetching all rows for all queries ?
> >
> I am using IBPlan Analyzer, that always do a fetch all.

No problem with that thus.

> I don't expect that put a lot of selects inside de mais select will make
> things fast, in the first case, was just one, but on this was 5... I was
> surprised...

Subselects should result in the same speed as a LEFT JOIN if only 1 field is
selected, but when you need more fields then it will be slow down.

> >Again you see 2 indexes choosen for ECD.
> >
> >>tr.MultiEmpresaID is a column that got indexed because of a FK, this is
> >>a VERY BAD index (I still prefer to have declarative FK constraints on
> >>create table statment, instead of enforce it via triggers, I tend to
> >>disable the indexes on views adding +0 to the column). When FB permits
> >>to create FK's without indexes I will be happy... ;-)
> >
> >Because this index is used for the very first table i don't think this

index

> >will really hurt in this case.
> >
> The problem here is causes by use of index FK_TITULORECEBER_MEMPRESA,
> that have the same value for all rows. (very bad index !!)
> for example

<snip>
MultiEmpresa me on (tr.MultiEmpresaID = me.MultiEmpresaID)

> where
> ecd.Nome like 'CISPLA%'
>
> PLAN JOIN (ME NATURAL,TR INDEX (FK_TITULORECEBER_MEMPRESA),TS INDEX
> (RDB$PRIMARY292),TT INDEX (RDB$PRIMARY294),ECD INDEX
> (SK_EMPRESA_CREDORDEVEDOR_NOME,RDB$PRIMARY26),CC INDEX (RDB$PRIMARY17),M
> INDEX (RDB$PRIMARY41),U INDEX (RDB$PRIMARY91))
>
> Returns 81 rows
> Execution Time: 300ms
> Fetch Time: 2500ms
>
> If I force to disable teh index usage, I got a much better performance:

<snip>

> TituloStatus ts on (tr.StatusId = ts.TituloStatusId) join
> MultiEmpresa me on (tr.MultiEmpresaID + 0 =

me.MultiEmpresaID)

> where
> ecd.Nome like 'CISPLA%'
>
> PLAN JOIN (ECD INDEX (SK_EMPRESA_CREDORDEVEDOR_NOME),TR INDEX
> (RDB$FOREIGN298),ME INDEX (RDB$PRIMARY357),TT INDEX (RDB$PRIMARY294),CC
> INDEX (RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),U INDEX (RDB$PRIMARY91),TS
> INDEX (RDB$PRIMARY292))
>
> note this line:
>
> MultiEmpresa me on (tr.MultiEmpresaID + 0 = me.MultiEmpresaID)
>
> Returns 81 rows
> Execution Time: 50ms
> Fetch Time: 70ms

The JOIN order is completly changed, this doesn't necessary meant that the
index FK_TITULORECEBER_MEMPRESA is the real problem here.

What happens if you changed the query to this :

select
*
from
TituloReceber tr join
TituloTipo tt on (tr.TituloTipoId = tt.TituloTipoId) join
Empresa_CredorDevedor ecd on (tr.EmpresaCredorDevedorId =
ecd.Empresa_CredorDevedor_Id + 0) join
ContaContabil cc on (tr.ContaContabilId =
cc.ContaContabilId) join
Moeda m on (tr.MoedaId = m.MoedaId) join
Usuario u on (tr.UsuarioId = u.UsuarioId) join
TituloStatus ts on (tr.StatusId = ts.TituloStatusId) join
MultiEmpresa me on (tr.MultiEmpresaID = me.MultiEmpresaID)
where
ecd.Nome like 'CISPLA%'

> If I remove the index and still put some filter on the where clause I
> got nothing, but without that filter I got a great improvement...
>
> All records on this table have tr.MultiEmpresaID = 1
>
> So in this case with or without that clause I got the same records, but
> it is just a case, I have other sites that have more than one recod on
> MultiEmpresa. I don't know the correct translation to English, but I
> will try to explain. I have some clients that has more then one
> enterprise inside the same database, other have just one... When the
> user log on my system he could choose to see consolidated data (data
> from all enterprises togheter), or filtered for just one enterprise. If
> the user chooses to see from just one, all queries are append with
> "and MultiEmpresaID = :SomeID".

But a index on that field can be very helpfull when you've more enterprises
and you need to collect data for only 1. Even when there are "many"
duplicates.

> this query (wich shows data from all Enterprise, i.e. no MultiEmpresaID
> filter)
> select
> *
> from
> TituloReceber tr join
> TituloTipo tt on (tr.TituloTipoId = tt.TituloTipoId)

join

> Empresa_CredorDevedor ecd on (tr.EmpresaCredorDevedorId =
> ecd.Empresa_CredorDevedor_Id) join
> ContaContabil cc on (tr.ContaContabilId =
> cc.ContaContabilId) join
> Moeda m on (tr.MoedaId = m.MoedaId) join
> Usuario u on (tr.UsuarioId = u.UsuarioId) join
> TituloStatus ts on (tr.StatusId = ts.TituloStatusId) join
> MultiEmpresa me on (tr.MultiEmpresaID + 0 =
> me.MultiEmpresaID)
> where
> ecd.Nome like 'CISPLA%'
> PLAN JOIN (ECD INDEX (SK_EMPRESA_CREDORDEVEDOR_NOME),TR INDEX
> (RDB$FOREIGN298),ME INDEX (RDB$PRIMARY357),TT INDEX (RDB$PRIMARY294),CC
> INDEX (RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),U INDEX (RDB$PRIMARY91),TS
> INDEX (RDB$PRIMARY292))
>
> runs very fast
>
> but if I filter on MultiEmpresaID I got slow response and a Natural scan
> on my table
> select
> *
> from
> TituloReceber tr join
> TituloTipo tt on (tr.TituloTipoId = tt.TituloTipoId)

join

> Empresa_CredorDevedor ecd on (tr.EmpresaCredorDevedorId =
> ecd.Empresa_CredorDevedor_Id) join
> ContaContabil cc on (tr.ContaContabilId =
> cc.ContaContabilId) join
> Moeda m on (tr.MoedaId = m.MoedaId) join
> Usuario u on (tr.UsuarioId = u.UsuarioId) join
> TituloStatus ts on (tr.StatusId = ts.TituloStatusId) join
> MultiEmpresa me on (tr.MultiEmpresaID + 0 =
> me.MultiEmpresaID)
> where
> ecd.Nome like 'CISPLA%' and
> tr.MultiEmpresaID = 1
> PLAN JOIN (TR NATURAL,ME INDEX (RDB$PRIMARY357),TT INDEX
> (RDB$PRIMARY294),ECD INDEX
> (SK_EMPRESA_CREDORDEVEDOR_NOME,RDB$PRIMARY26),CC INDEX (RDB$PRIMARY17),M
> INDEX (RDB$PRIMARY41),U INDEX (RDB$PRIMARY91),TS INDEX (RDB$PRIMARY292))

The JOIN order is the main difference here which causes the slow down.

> >Also almost no difference, because i think you're looking at the wrong
> >point. Table ECD is the problem here i guess.
> >
> >
> >
> I don't understand why have a natural plan on TituloReceber (TR) if I
> add the where MultiEmpresaID + 0= 1
>
> >STARTING WITH is not taken into consideration for a JOIN ORDER (yet),

thus

> >for special cases you've to workaround that fact.
> >
> >
> >SELECT
> > tr.Numero,
> > tr.Parcela,
> > tr.EmpresaCredorDevedorId,
> > ecd.Nome,
> > <snip>
> >FROM
> > TituloReceber tr
> > JOIN TituloTipo tt ON (tr.TituloTipoId = tt.TituloTipoId)
> > JOIN Empresa_CredorDevedor ecd ON
> > (tr.EmpresaCredorDevedorId = ecd.Empresa_CredorDevedor_Id + 0)
> > JOIN ContaContabil cc ON (tr.ContaContabilId = cc.ContaContabilId)
> > JOIN Moeda m ON (tr.MoedaId = m.MoedaId)
> > JOIN Usuario u ON (tr.UsuarioId = u.UsuarioId)
> > JOIN TituloStatus ts ON (tr.StatusId = ts.TituloStatusId)
> > JOIN MultiEmpresa me ON (tr.MEmpresaID = me.MultiEmpresaID)
> >WHERE
> > ecd.Nome LIKE 'CISPLA%' and
> > tr.MEmpresaID = 1;
> >
> >
> >In above query i force the optimizer not use a index for
> >Empresa_CredorDevedor_Id, so if i'm right the optimizer chooses a JOIN

order

> >where the table ECD will be at the first place. And though a index will

> >used on this table, because of the LIKE.
> >Thus what happens in all mentioned slow queries when you add "+ 0" to
> >Empresa_CredorDevedor_Id in the ON clause ?
> >
> Empresa_CredorDevedor_Id is a good index
> the plan for your siggested query is:
> PLAN JOIN (JOIN (TR NATURAL,ME INDEX (RDB$PRIMARY357),TT INDEX
> (RDB$PRIMARY294),CC INDEX (RDB$PRIMARY17),M INDEX (RDB$PRIMARY41),U
> INDEX (RDB$PRIMARY91),TS INDEX (RDB$PRIMARY292)),ECD INDEX
> (SK_EMPRESA_CREDORDEVEDOR_NOME))

This is not the PLAN i hoped to get, ECD is at the last position in the JOIN
order and it need to be at the first position. I've my questions why the
optimizer has choosen this PLAN.

> If you wish I can make a vnc session for you, or open a temporary
> internet access to my server

When you have the ability to send me a zipped backup that would also be
fine, but i understand when you can not.

Send me a private mail : firebird at abvisie.nl

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://80.126.130.81