Subject | Re: IS OPTIMIZER OPTIMIZING ? |
---|---|
Author | Lista de Discução Interbase |
Post date | 2002-05-02T15:30:36Z |
Hi guys !
I will join the optimizing discussion and ask you something.
First I will show my table structure:
CREATE DOMAIN FATOR AS NUMERIC(9, 4) NOT NULL;
CREATE DOMAIN INTEIRO AS INTEGER NOT NULL;
CREATE DOMAIN MEMO AS VARCHAR(2000);
CREATE DOMAIN PRECO AS NUMERIC(15, 4) NOT NULL;
CREATE DOMAIN QUANTIDADE AS NUMERIC(15, 4) NOT NULL;
CREATE DOMAIN STRING40 AS VARCHAR(40) NOT NULL;
CREATE DOMAIN STRING80 AS VARCHAR(80) NOT NULL;
CREATE DOMAIN TABLEID AS INTEGER NOT NULL;
Table ServicoComposicao has 17404 records
/* Table: SERVICOCOMPOSICAO, Owner: SYSDBA */
CREATE TABLE SERVICOCOMPOSICAO
(
SERVICOCOMPOSICAOID TABLEID,
SERVICOID TABLEID,
ITEM INTEIRO,
PRODUTOID TABLEID,
QUANTIDADE QUANTIDADE,
FATOR FATOR,
CONSTRAINT PK_SERVICOCOMPOSICAO PRIMARY KEY (SERVICOCOMPOSICAOID)
);
ALTER TABLE SERVICOCOMPOSICAO ADD CONSTRAINT FK_SERVICOCOMPOSICAO_PRODUTO
FOREIGN KEY (PRODUTOID) REFERENCES PRODUTO (PRODUTOID);
ALTER TABLE SERVICOCOMPOSICAO ADD CONSTRAINT FK_SERVICOCOMPOSICAO_SERVICO
FOREIGN KEY (SERVICOID) REFERENCES SERVICO (SERVICOID);
Table LinhaProduto has 186 records
/* Table: LINHAPRODUTO, Owner: SYSDBA */
CREATE TABLE LINHAPRODUTO
(
LINHAPRODUTOID TABLEID,
DESCRICAO STRING80,
CONSTRAINT PK_LINHAPRODUTO PRIMARY KEY (LINHAPRODUTOID)
);
Table Produto has 3139 records
/* Table: PRODUTO, Owner: SYSDBA */
CREATE TABLE PRODUTO
(
PRODUTOID TABLEID,
LINHAPRODUTOID TABLEID,
CODIGO STRING40,
DESCRICAO STRING80,
DESCRICAOCOMPLETA MEMO,
ESTOQUEFISICO QUANTIDADE,
QUANTIDADETRANSITO QUANTIDADE,
ESTOQUEMINIMO QUANTIDADE,
PRECOCUSTO PRECO,
UNIDADEID TABLEID,
CMP_QUANTIDADEDISPONIVEL QUANTIDADE,
CONSTRAINT PK_PRODUTO PRIMARY KEY (PRODUTOID)
);
ALTER TABLE PRODUTO ADD CONSTRAINT FK_PRODUTO_LINHAPRODUTO FOREIGN KEY
(LINHAPRODUTOID) REFERENCES LINHAPRODUTO (LINHAPRODUTOID);
The above tables are small, every one has a PK field (ServicoID, ProdutoID,
LinhaProdutoID) type integer generated by generators, so small key with
high selectivity.
I'd like to understand a little bit on how optimizer works. Lets show some
querys.
select
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao, sum(SC.Quantidade *
P.PrecoCusto)
from
ServicoComposicao SC join
Produto P on (P.ProdutoID = ServicoComposicao.ProdutoID) join
LinhaProduto LP on (LP.LinhaProdutoID = P.LinhaProdutoID)
where
SC.ServicoID = 20310
group by
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao
PLAN SORT (JOIN (SC INDEX (AK_SERVICOCOMPOSICAO_ITEM),P INDEX
(RDB$PRIMARY10),LP INDEX (RDB$PRIMARY8)))
Execution Time: 00:00:00.0010 (1st time)
Execution Time: 00:00:00.0000 (2nd time)
Execution Time: 00:00:00.0000 (3rd time)
Execution Time: 00:00:00.0000 (4rd time)
This query returns 7 rows.
the result is instantaneous. no problem here.
But if I create a view like this:
CREATE VIEW VSERVICOCOMPOSICAOLINHAPRODUTO (
ID,
LINHAPRODUTOID,
LINHAPRODUTODESCRICAO,
CUSTOTOTAL
) AS
select
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao, sum(SC.Quantidade *
P.PrecoCusto)
from
ServicoComposicao SC join
Produto P on (P.ProdutoID = ServicoComposicao.ProdutoID) join
LinhaProduto LP on (LP.LinhaProdutoID = P.LinhaProdutoID)
group by
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao;
and send the following query:
Select * from vServicoComposicaoLinhaProduto where ID = 20310
the plan is:
PLAN SORT (JOIN (VSERVICOCOMPOSICAOLINHAPRODUTO LP
NATURAL,VSERVICOCOMPOSICAOLINHAPRODUTO P INDEX
(RDB$FOREIGN28),VSERVICOCOMPOSICAOLINHAPRODUTO SC INDEX (RDB$FOREIGN23)))
Execution Time: 00:00:00.0631 (1st time)
Execution Time: 00:00:00.0621 (2nd time)
Execution Time: 00:00:00.0641 (3rd time)
Execution Time: 00:00:00.0641 (4rd time)
This query returns 7 rows too (of course).
I know the response time is tiny, but I can see a big diference when run
the query, the first just appears, the second one I see the delay.
Comparing the execution time reported by IBConsole the first one is from 60
times faster to infinitum times faster :-). And I hope these tables will
grow up to 10 times bigger than it was now.
Why optimizer choose to do a natural search on table LinhaProduto, why not
choose the PK Index of table LinhaProduto ?
I think the plan of the two queries should be the same, since I am doing
the same query, just that the second is trough a view, I thought tyhat the
view just encapsulates the query making it more readable, when someone send
a query using a view the "engine" substitute the view by the equivalent
statment, but now I am sure I am wrong with this. As far I can recall, I do
some tests sending to queries like the above in MSSQL and always the plan
is the same.
I noticed another strange (in my opinion) behavior of the engine, I am
sorry, but I need to find it again to show you, the optimizer choose to do
a natural search on one table and I cannot force the plan to use an index
and the engine only execute the query in a natural way.
Below are some statistics from the database:
Database header page information:
Flags 0
Checksum 12345
Generation 84
Page size 4096
ODS version 10.0
Oldest transaction 65
Oldest active 66
Oldest snapshot 66
Next transaction 78
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 1
Creation date May 2, 2002 11:47:58
LINHAPRODUTO (134)
Primary pointer page: 177, Index root page: 178
Data pages: 4, data page slots: 4, average fill: 63%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 3
80 - 99% = 0
Index RDB$PRIMARY8 (0)
Depth: 1, leaf buckets: 1, nodes: 186
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Index SK_LINHAPRODUTO_DESCRICAO (1)
Depth: 1, leaf buckets: 1, nodes: 186
Average data length: 13.00, total dup: 47, max dup: 17
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1
PRODUTO (136)
Primary pointer page: 181, Index root page: 182
Data pages: 149, data page slots: 149, average fill: 86%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 148
Index AK_PRODUTO_CODIGO (1)
Depth: 2, leaf buckets: 6, nodes: 3139
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 5
Index RDB$FOREIGN28 (3)
Depth: 2, leaf buckets: 5, nodes: 3139
Average data length: 0.00, total dup: 2965, max dup: 86
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 4
Index RDB$PRIMARY10 (0)
Depth: 2, leaf buckets: 6, nodes: 3139
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 5
Index SK_PRODUTO_DESCRICAO (2)
Depth: 2, leaf buckets: 18, nodes: 3139
Average data length: 16.00, total dup: 40, max dup: 24
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 17
SERVICOCOMPOSICAO (130)
Primary pointer page: 168, Index root page: 169
Data pages: 303, data page slots: 303, average fill: 67%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 303
80 - 99% = 0
Index AK_SERVICOCOMPOSICAO_ITEM (1)
Depth: 2, leaf buckets: 34, nodes: 17404
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 33
Index RDB$FOREIGN22 (2)
Depth: 2, leaf buckets: 27, nodes: 17404
Average data length: 0.00, total dup: 13626, max dup: 45
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 27
Index RDB$FOREIGN23 (3)
Depth: 2, leaf buckets: 27, nodes: 17404
Average data length: 0.00, total dup: 14607, max dup: 1172
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 26
Index RDB$PRIMARY4 (0)
Depth: 2, leaf buckets: 31, nodes: 17404
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 30
The results are the same on Firebird 1 on WinXP and on Firebird
1.0.0-0a.Beta2 on RedHat Linux.
Sorry for my bad English and for the long message, and I hope you could
understand me.
[]s
Alexandre
I will join the optimizing discussion and ask you something.
First I will show my table structure:
CREATE DOMAIN FATOR AS NUMERIC(9, 4) NOT NULL;
CREATE DOMAIN INTEIRO AS INTEGER NOT NULL;
CREATE DOMAIN MEMO AS VARCHAR(2000);
CREATE DOMAIN PRECO AS NUMERIC(15, 4) NOT NULL;
CREATE DOMAIN QUANTIDADE AS NUMERIC(15, 4) NOT NULL;
CREATE DOMAIN STRING40 AS VARCHAR(40) NOT NULL;
CREATE DOMAIN STRING80 AS VARCHAR(80) NOT NULL;
CREATE DOMAIN TABLEID AS INTEGER NOT NULL;
Table ServicoComposicao has 17404 records
/* Table: SERVICOCOMPOSICAO, Owner: SYSDBA */
CREATE TABLE SERVICOCOMPOSICAO
(
SERVICOCOMPOSICAOID TABLEID,
SERVICOID TABLEID,
ITEM INTEIRO,
PRODUTOID TABLEID,
QUANTIDADE QUANTIDADE,
FATOR FATOR,
CONSTRAINT PK_SERVICOCOMPOSICAO PRIMARY KEY (SERVICOCOMPOSICAOID)
);
ALTER TABLE SERVICOCOMPOSICAO ADD CONSTRAINT FK_SERVICOCOMPOSICAO_PRODUTO
FOREIGN KEY (PRODUTOID) REFERENCES PRODUTO (PRODUTOID);
ALTER TABLE SERVICOCOMPOSICAO ADD CONSTRAINT FK_SERVICOCOMPOSICAO_SERVICO
FOREIGN KEY (SERVICOID) REFERENCES SERVICO (SERVICOID);
Table LinhaProduto has 186 records
/* Table: LINHAPRODUTO, Owner: SYSDBA */
CREATE TABLE LINHAPRODUTO
(
LINHAPRODUTOID TABLEID,
DESCRICAO STRING80,
CONSTRAINT PK_LINHAPRODUTO PRIMARY KEY (LINHAPRODUTOID)
);
Table Produto has 3139 records
/* Table: PRODUTO, Owner: SYSDBA */
CREATE TABLE PRODUTO
(
PRODUTOID TABLEID,
LINHAPRODUTOID TABLEID,
CODIGO STRING40,
DESCRICAO STRING80,
DESCRICAOCOMPLETA MEMO,
ESTOQUEFISICO QUANTIDADE,
QUANTIDADETRANSITO QUANTIDADE,
ESTOQUEMINIMO QUANTIDADE,
PRECOCUSTO PRECO,
UNIDADEID TABLEID,
CMP_QUANTIDADEDISPONIVEL QUANTIDADE,
CONSTRAINT PK_PRODUTO PRIMARY KEY (PRODUTOID)
);
ALTER TABLE PRODUTO ADD CONSTRAINT FK_PRODUTO_LINHAPRODUTO FOREIGN KEY
(LINHAPRODUTOID) REFERENCES LINHAPRODUTO (LINHAPRODUTOID);
The above tables are small, every one has a PK field (ServicoID, ProdutoID,
LinhaProdutoID) type integer generated by generators, so small key with
high selectivity.
I'd like to understand a little bit on how optimizer works. Lets show some
querys.
select
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao, sum(SC.Quantidade *
P.PrecoCusto)
from
ServicoComposicao SC join
Produto P on (P.ProdutoID = ServicoComposicao.ProdutoID) join
LinhaProduto LP on (LP.LinhaProdutoID = P.LinhaProdutoID)
where
SC.ServicoID = 20310
group by
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao
PLAN SORT (JOIN (SC INDEX (AK_SERVICOCOMPOSICAO_ITEM),P INDEX
(RDB$PRIMARY10),LP INDEX (RDB$PRIMARY8)))
Execution Time: 00:00:00.0010 (1st time)
Execution Time: 00:00:00.0000 (2nd time)
Execution Time: 00:00:00.0000 (3rd time)
Execution Time: 00:00:00.0000 (4rd time)
This query returns 7 rows.
the result is instantaneous. no problem here.
But if I create a view like this:
CREATE VIEW VSERVICOCOMPOSICAOLINHAPRODUTO (
ID,
LINHAPRODUTOID,
LINHAPRODUTODESCRICAO,
CUSTOTOTAL
) AS
select
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao, sum(SC.Quantidade *
P.PrecoCusto)
from
ServicoComposicao SC join
Produto P on (P.ProdutoID = ServicoComposicao.ProdutoID) join
LinhaProduto LP on (LP.LinhaProdutoID = P.LinhaProdutoID)
group by
SC.ServicoID, LP.LinhaProdutoID, LP.Descricao;
and send the following query:
Select * from vServicoComposicaoLinhaProduto where ID = 20310
the plan is:
PLAN SORT (JOIN (VSERVICOCOMPOSICAOLINHAPRODUTO LP
NATURAL,VSERVICOCOMPOSICAOLINHAPRODUTO P INDEX
(RDB$FOREIGN28),VSERVICOCOMPOSICAOLINHAPRODUTO SC INDEX (RDB$FOREIGN23)))
Execution Time: 00:00:00.0631 (1st time)
Execution Time: 00:00:00.0621 (2nd time)
Execution Time: 00:00:00.0641 (3rd time)
Execution Time: 00:00:00.0641 (4rd time)
This query returns 7 rows too (of course).
I know the response time is tiny, but I can see a big diference when run
the query, the first just appears, the second one I see the delay.
Comparing the execution time reported by IBConsole the first one is from 60
times faster to infinitum times faster :-). And I hope these tables will
grow up to 10 times bigger than it was now.
Why optimizer choose to do a natural search on table LinhaProduto, why not
choose the PK Index of table LinhaProduto ?
I think the plan of the two queries should be the same, since I am doing
the same query, just that the second is trough a view, I thought tyhat the
view just encapsulates the query making it more readable, when someone send
a query using a view the "engine" substitute the view by the equivalent
statment, but now I am sure I am wrong with this. As far I can recall, I do
some tests sending to queries like the above in MSSQL and always the plan
is the same.
I noticed another strange (in my opinion) behavior of the engine, I am
sorry, but I need to find it again to show you, the optimizer choose to do
a natural search on one table and I cannot force the plan to use an index
and the engine only execute the query in a natural way.
Below are some statistics from the database:
Database header page information:
Flags 0
Checksum 12345
Generation 84
Page size 4096
ODS version 10.0
Oldest transaction 65
Oldest active 66
Oldest snapshot 66
Next transaction 78
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 1
Creation date May 2, 2002 11:47:58
LINHAPRODUTO (134)
Primary pointer page: 177, Index root page: 178
Data pages: 4, data page slots: 4, average fill: 63%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 3
80 - 99% = 0
Index RDB$PRIMARY8 (0)
Depth: 1, leaf buckets: 1, nodes: 186
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 0
Index SK_LINHAPRODUTO_DESCRICAO (1)
Depth: 1, leaf buckets: 1, nodes: 186
Average data length: 13.00, total dup: 47, max dup: 17
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1
PRODUTO (136)
Primary pointer page: 181, Index root page: 182
Data pages: 149, data page slots: 149, average fill: 86%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 148
Index AK_PRODUTO_CODIGO (1)
Depth: 2, leaf buckets: 6, nodes: 3139
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 5
Index RDB$FOREIGN28 (3)
Depth: 2, leaf buckets: 5, nodes: 3139
Average data length: 0.00, total dup: 2965, max dup: 86
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 4
Index RDB$PRIMARY10 (0)
Depth: 2, leaf buckets: 6, nodes: 3139
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 5
Index SK_PRODUTO_DESCRICAO (2)
Depth: 2, leaf buckets: 18, nodes: 3139
Average data length: 16.00, total dup: 40, max dup: 24
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 17
SERVICOCOMPOSICAO (130)
Primary pointer page: 168, Index root page: 169
Data pages: 303, data page slots: 303, average fill: 67%
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 303
80 - 99% = 0
Index AK_SERVICOCOMPOSICAO_ITEM (1)
Depth: 2, leaf buckets: 34, nodes: 17404
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 33
Index RDB$FOREIGN22 (2)
Depth: 2, leaf buckets: 27, nodes: 17404
Average data length: 0.00, total dup: 13626, max dup: 45
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 27
Index RDB$FOREIGN23 (3)
Depth: 2, leaf buckets: 27, nodes: 17404
Average data length: 0.00, total dup: 14607, max dup: 1172
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 0
60 - 79% = 1
80 - 99% = 26
Index RDB$PRIMARY4 (0)
Depth: 2, leaf buckets: 31, nodes: 17404
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 30
The results are the same on Firebird 1 on WinXP and on Firebird
1.0.0-0a.Beta2 on RedHat Linux.
Sorry for my bad English and for the long message, and I hope you could
understand me.
[]s
Alexandre