Subject | Re: [firebird-support] Query that was very fast under the oldest 2.5 versions are very slow under 2.5.8 or 2.5.9 |
---|---|
Author | setysvar |
Post date | 2019-05-25T23:38:01Z |
One of the first things Firebird taught me (well, I don't remember
whether if it was Firebird 0.9.4 or InterBase 5.6), was to never use
IN(<subselect>). At that time, the optimizer didn't check if the
subselect was correlated(*) or not, and hence executed the subselect for
every possible row in the outer select.
The optimizer has improved since then, but I got used to using EXISTS
and have never wanted to try IN(<subselect>) again and don't know which
pitfalls still exist. I expect that your use of DISTINCT adds to the
difficulties (and order by PROC ignored), but I don't know.
Try replacing your query with
select PROC from PROC
where exists(select * from MOVI
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N'
and PROC.PROC = MOVI.PROC)
I assume you have indexes for MOVI.FECH, MOVI.MIEM and possibly
MOVI.PROC (those seems like selective fields for this particular query).
If such indexes are active, this doesn't seem like a query that should
take long (24 minutes sounds like indexes not being active for MOVI),
even though the EXISTS has to be executed 46000 times.
Alternatively, you could maybe rewrite the query to
select distinct PROC.PROC
from PROC
join MOVI on PROC.PROC = MOVI.PROC
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N'
However, this will eliminate duplicates. You can keep duplicates with
adding a level, e.g.
with TMP(PROC) as
(select distinct PROC.PROC
from PROC
join MOVI on PROC.PROC = MOVI.PROC
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N')
select PROC.PROC
from TMP
join PROC on TMP.PROC = PROC.PROC
If this doesn't help, please show the plans Karol requests (and tell us
something about the definition of indexes referenced and the selectivity
of fields involved in these indexes).
HTH,
Set
(*) with "correlated or not" I mean that the optimizer didn't understand
that the subselect in
SELECT *
FROM A
WHERE A.FIELD IN (SELECT B.FIELD FROM B)
can be executed only once and the result reused, as opposed to the
subselect in
SELECT *
FROM A
WHERE A.FIELD IN (SELECT B.FIELD FROM B WHERE A.FIELD2 = B.FIELD2)
that has to be executed repeatedly.
whether if it was Firebird 0.9.4 or InterBase 5.6), was to never use
IN(<subselect>). At that time, the optimizer didn't check if the
subselect was correlated(*) or not, and hence executed the subselect for
every possible row in the outer select.
The optimizer has improved since then, but I got used to using EXISTS
and have never wanted to try IN(<subselect>) again and don't know which
pitfalls still exist. I expect that your use of DISTINCT adds to the
difficulties (and order by PROC ignored), but I don't know.
Try replacing your query with
select PROC from PROC
where exists(select * from MOVI
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N'
and PROC.PROC = MOVI.PROC)
I assume you have indexes for MOVI.FECH, MOVI.MIEM and possibly
MOVI.PROC (those seems like selective fields for this particular query).
If such indexes are active, this doesn't seem like a query that should
take long (24 minutes sounds like indexes not being active for MOVI),
even though the EXISTS has to be executed 46000 times.
Alternatively, you could maybe rewrite the query to
select distinct PROC.PROC
from PROC
join MOVI on PROC.PROC = MOVI.PROC
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N'
However, this will eliminate duplicates. You can keep duplicates with
adding a level, e.g.
with TMP(PROC) as
(select distinct PROC.PROC
from PROC
join MOVI on PROC.PROC = MOVI.PROC
where MOVI.TIPO in ('1','A','B')
and MOVI.FECH between '20190301' and '20190412'
and MOVI.MIEM = '15JMS45D7A'
and MOVI.HECH = 'N')
select PROC.PROC
from TMP
join PROC on TMP.PROC = PROC.PROC
If this doesn't help, please show the plans Karol requests (and tell us
something about the definition of indexes referenced and the selectivity
of fields involved in these indexes).
HTH,
Set
(*) with "correlated or not" I mean that the optimizer didn't understand
that the subselect in
SELECT *
FROM A
WHERE A.FIELD IN (SELECT B.FIELD FROM B)
can be executed only once and the result reused, as opposed to the
subselect in
SELECT *
FROM A
WHERE A.FIELD IN (SELECT B.FIELD FROM B WHERE A.FIELD2 = B.FIELD2)
that has to be executed repeatedly.