Subject Re: [firebird-support] Full outer join poor performance.
Author setysvar
Den 17.05.2016 12:20, skrev agharta agharta82@... [firebird-support]:
> Hi all, this is my first post and my english is bad, please be patient :-)
>
> I've a problem with a full outer join, let me explain.
>
> 2 tables (100000 rows each), 2 cte, 2 full outer joins, 100 rows output.
> No indexes. Execution took about 40s..... in postgres (i've do a equal
> test comparision)...270ms
>
> I've tested it on FB 2.1.7, 2.5.5 and 3.0.0, OS is virtual 64 bit
> Windows server 2012 8GB ram, Classic Server, Dialect 3, no firebird.conf
> optimization/modify, no ssd, no raid, pagesize 16mb
>
> The test query may be stupid, but it explains exacty the performace problem.
>
> Create table script:
>
> TABLE 1:
> CREATE SEQUENCE GEN_NEW_TABLE_ID;
>
> CREATE TABLE NEW_TABLE (
> ID INTEGER NOT NULL,
> FIELD1 INTEGER,
> FIELD2 INTEGER,
> FIELD3 INTEGER,
> FIELD4 INTEGER,
> FILED5 INTEGER
> );
>
> ALTER TABLE NEW_TABLE ADD CONSTRAINT PK_NEW_TABLE PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE_BI FOR NEW_TABLE
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
> new.id = gen_id(gen_new_table_id,1); --or use next value for
> end
>
> TABLE 2:
> CREATE SEQUENCE GEN_NEW_TABLE2_ID;
>
> CREATE TABLE NEW_TABLE2 (
> ID INTEGER NOT NULL,
> FIELD1 INTEGER,
> FIELD2 INTEGER,
> FIELD3 INTEGER,
> FIELD4 INTEGER,
> FILED5 INTEGER
> );
>
> ALTER TABLE NEW_TABLE2 ADD CONSTRAINT PK_NEW_TABLE2 PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE2_BI FOR NEW_TABLE2
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
> new.id = gen_id(gen_new_table2_id,1); --or use next value for
> end
>
> Stupid bulk insert 100000 records in each table (same records in each
> table):
>
> execute block
> as
> declare variable i int =0;
> declare variable x int =0;
> begin
> while (x < 100) do
> begin
> i =0;
> while (i < 1000) do
> begin
> insert into NEW_TABLE values (:i, :i, :i, :i, :i, :i);
> insert into NEW_TABLE2 values (:i, :i, :i, :i, :i, :i);
> i = i + 1;
> end
> x = x + 1;
> end
> end
>
> Ok, now the query:
>
> with
> T2CTE as (select first 10 * from new_table2 where field1=1),
> T1CTE as (select first 100 * from new_table where field1=1)
>
> select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
> from T1CTE t1
> full outer join T2CTE t2 on (t2.field1 = t1.field1 and t2.field2 =
> t1.field2 and t2.field3 = t1.field3 and t2.field4 = t1.field4 )
> full outer join T1CTE t1x on (t1x.field1 = t1.field1 and t1x.field2 =
> t1.field2 and t1x.field3 = t1.field3 and t1x.field4 = t1.field4)
> group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
>
> Yes, it's stupid but shows me the problem (IBEXPERT output):
>
> Plan
> PLAN SORT (JOIN (T1X NEW_TABLE NATURAL, JOIN (T2 NEW_TABLE2 NATURAL, T1
> NEW_TABLE NATURAL)))
>
> Prepare time = 16ms
> Execute time = 40s 750ms
> Avg fetch time = 1.455,36 ms
> Current memory = 137.242.916
> Max memory = 157.081.992
> Memory buffers = 8.192
> Reads from disk to cache = 0
> Writes from cache to disk = 0
> Fetches from cache = 222.991.689
>
> Again, viewing performance reads summary, it says:
>
> Non indexed reads:
> NEW_TABLE: 110092224
> NEW_TABLE2: 929402
>
> 110092224? 929402? WHY? CTE EXPLICITY SAYS 100 and 10!
>
> I've repeat the test with postgresql (same machine, postgres 9.5, same
> standard installation and same table data) and average execution time is
> about 270ms. It shows me the effective cte scan:
>
> HashAggregate (cost=2088.17..2089.16 rows=99 width=20)
> Group Key: t1.id, t1.field1, t1.field2, t1.field3, t1.field4
> CTE t2cte
> -> Limit (cost=0.00..190.61 rows=10 width=24)
> -> Seq Scan on new_table2 (cost=0.00..1887.00 rows=99 width=24)
> Filter: (field1 = 1)
> CTE t1cte
> -> Limit (cost=0.00..1887.00 rows=99 width=24)
> -> Seq Scan on new_table (cost=0.00..1887.00 rows=99 width=24)
> Filter: (field1 = 1)
> -> Hash Full Join (cost=4.36..9.33 rows=99 width=20)
> Hash Cond: ((t1.field1 = t1x.field1) AND (t1.field2 =
> t1x.field2) AND (t1.field3 = t1x.field3) AND (t1.field4 = t1x.field4))
> -> Hash Full Join (cost=0.40..3.87 rows=99 width=20)
> Hash Cond: ((t1.field1 = t2.field1) AND (t1.field2 =
> t2.field2) AND (t1.field3 = t2.field3) AND (t1.field4 = t2.field4))
> -> CTE Scan on t1cte t1 (cost=0.00..1.98 rows=99 width=20)
> -> Hash (cost=0.20..0.20 rows=10 width=16)
> -> CTE Scan on t2cte t2 (cost=0.00..0.20 rows=10
> width=16)
> -> Hash (cost=1.98..1.98 rows=99 width=16)
> -> CTE Scan on t1cte t1x (cost=0.00..1.98 rows=99 width=16)
>
> Seems (i'm not a master of known universe in firebird) that the firebird
> performs a full table scan and joins mutch more data instead of the CTE
> results.
>
> Any ideas about why fb take too long time to execute the query (and
> scans so mutch rows)? I've omitted some CTE sintax to force fb to use it?
>
> The question is not about the query optimization (full outer join can be
> omitted, i know), but about why fb performs many reads insetad of 100 +
> 10 cte reads.
>
> Please correct me because i'm sure that i've done something erroneusly.
>
> Best regards,
>
> Agharta
Hi Agharta, thanks for a perfect description of your problem!

Does the performance change if you add another CTE and change the query
like below?

with
T2CTE as (select first 10 * from new_table2 where field1=1),
T1CTE as (select first 100 * from new_table where field1=1),
T3CTE as (select * from T1CTE union select * from T2CTE)

select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
from T3CTE t3
left join T1CTE t1
on t3.field1 = t1.field1 and t3.field2 = t1.field2 and t3.field3 = t1.field3 and t3.field4 = t1.field4
left join T2CTE t2
on t3.field1 = t2.field1 and t3.field2 = t2.field2 and t3.field3 = t2.field3 and t3.field4 = t2.field4
left join T1CTE t1x
on t3.field1 = t1x.field1 and t3.field2 = t1x.field2 and t3.field3 = t1x.field3 and t3.field4 = t1x.field4
group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4

I think the result should be identical (though I am slightly uncertain
if there are say three identical rows in both T1CTE and T2CTE), and it
wouldn't surprise me if it was quicker (I think I once did something
similar myself when a query took too long, though I don't remember which
Firebird version I then used). If it works, then I would call it a
workaround, and agree that Firebird is not great for FULL OUTER JOIN on
larger tables.

HTH,
Set