Subject Re: [firebird-support] Full outer join poor performance.
Author agharta
On 05/17/2016 10:51 PM, setysvar setysvar@... [firebird-support] wrote:
 

Den 17.05.2016 12:20, skrev agharta agharta82@... [firebird-support]:
> Hi all, this is my first post and my english is bad, please be patient :-)
>
> I've a problem with a full outer join, let me explain.
>
> 2 tables (100000 rows each), 2 cte, 2 full outer joins, 100 rows output.
> No indexes. Execution took about 40s..... in postgres (i've do a equal
> test comparision)...270ms
>
> I've tested it on FB 2.1.7, 2.5.5 and 3.0.0, OS is virtual 64 bit
> Windows server 2012 8GB ram, Classic Server, Dialect 3, no firebird.conf
> optimization/modify, no ssd, no raid, pagesize 16mb
>
> The test query may be stupid, but it explains exacty the performace problem.
>
> Create table script:
>
> TABLE 1:
> CREATE SEQUENCE GEN_NEW_TABLE_ID;
>
> CREATE TABLE NEW_TABLE (
> ID INTEGER NOT NULL,
> FIELD1 INTEGER,
> FIELD2 INTEGER,
> FIELD3 INTEGER,
> FIELD4 INTEGER,
> FILED5 INTEGER
> );
>
> ALTER TABLE NEW_TABLE ADD CONSTRAINT PK_NEW_TABLE PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE_BI FOR NEW_TABLE
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
> new.id = gen_id(gen_new_table_id,1); --or use next value for
> end
>
> TABLE 2:
> CREATE SEQUENCE GEN_NEW_TABLE2_ID;
>
> CREATE TABLE NEW_TABLE2 (
> ID INTEGER NOT NULL,
> FIELD1 INTEGER,
> FIELD2 INTEGER,
> FIELD3 INTEGER,
> FIELD4 INTEGER,
> FILED5 INTEGER
> );
>
> ALTER TABLE NEW_TABLE2 ADD CONSTRAINT PK_NEW_TABLE2 PRIMARY KEY (ID);
>
> CREATE OR ALTER TRIGGER NEW_TABLE2_BI FOR NEW_TABLE2
> ACTIVE BEFORE INSERT POSITION 0
> as
> begin
> new.id = gen_id(gen_new_table2_id,1); --or use next value for
> end
>
> Stupid bulk insert 100000 records in each table (same records in each
> table):
>
> execute block
> as
> declare variable i int =0;
> declare variable x int =0;
> begin
> while (x < 100) do
> begin
> i =0;
> while (i < 1000) do
> begin
> insert into NEW_TABLE values (:i, :i, :i, :i, :i, :i);
> insert into NEW_TABLE2 values (:i, :i, :i, :i, :i, :i);
> i = i + 1;
> end
> x = x + 1;
> end
> end
>
> Ok, now the query:
>
> with
> T2CTE as (select first 10 * from new_table2 where field1=1),
> T1CTE as (select first 100 * from new_table where field1=1)
>
> select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
> from T1CTE t1
> full outer join T2CTE t2 on (t2.field1 = t1.field1 and t2.field2 =
> t1.field2 and t2.field3 = t1.field3 and t2.field4 = t1.field4 )
> full outer join T1CTE t1x on (t1x.field1 = t1.field1 and t1x.field2 =
> t1.field2 and t1x.field3 = t1.field3 and t1x.field4 = t1.field4)
> group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
>
> Yes, it's stupid but shows me the problem (IBEXPERT output):
>
> Plan
> PLAN SORT (JOIN (T1X NEW_TABLE NATURAL, JOIN (T2 NEW_TABLE2 NATURAL, T1
> NEW_TABLE NATURAL)))
>
> Prepare time = 16ms
> Execute time = 40s 750ms
> Avg fetch time = 1.455,36 ms
> Current memory = 137.242.916
> Max memory = 157.081.992
> Memory buffers = 8.192
> Reads from disk to cache = 0
> Writes from cache to disk = 0
> Fetches from cache = 222.991.689
>
> Again, viewing performance reads summary, it says:
>
> Non indexed reads:
> NEW_TABLE: 110092224
> NEW_TABLE2: 929402
>
> 110092224? 929402? WHY? CTE EXPLICITY SAYS 100 and 10!
>
> I've repeat the test with postgresql (same machine, postgres 9.5, same
> standard installation and same table data) and average execution time is
> about 270ms. It shows me the effective cte scan:
>
> HashAggregate (cost=2088.17..2089.16 rows=99 width=20)
> Group Key: t1.id, t1.field1, t1.field2, t1.field3, t1.field4
> CTE t2cte
> -> Limit (cost=0.00..190.61 rows=10 width=24)
> -> Seq Scan on new_table2 (cost=0.00..1887.00 rows=99 width=24)
> Filter: (field1 = 1)
> CTE t1cte
> -> Limit (cost=0.00..1887.00 rows=99 width=24)
> -> Seq Scan on new_table (cost=0.00..1887.00 rows=99 width=24)
> Filter: (field1 = 1)
> -> Hash Full Join (cost=4.36..9.33 rows=99 width=20)
> Hash Cond: ((t1.field1 = t1x.field1) AND (t1.field2 =
> t1x.field2) AND (t1.field3 = t1x.field3) AND (t1.field4 = t1x.field4))
> -> Hash Full Join (cost=0.40..3.87 rows=99 width=20)
> Hash Cond: ((t1.field1 = t2.field1) AND (t1.field2 =
> t2.field2) AND (t1.field3 = t2.field3) AND (t1.field4 = t2.field4))
> -> CTE Scan on t1cte t1 (cost=0.00..1.98 rows=99 width=20)
> -> Hash (cost=0.20..0.20 rows=10 width=16)
> -> CTE Scan on t2cte t2 (cost=0.00..0.20 rows=10
> width=16)
> -> Hash (cost=1.98..1.98 rows=99 width=16)
> -> CTE Scan on t1cte t1x (cost=0.00..1.98 rows=99 width=16)
>
> Seems (i'm not a master of known universe in firebird) that the firebird
> performs a full table scan and joins mutch more data instead of the CTE
> results.
>
> Any ideas about why fb take too long time to execute the query (and
> scans so mutch rows)? I've omitted some CTE sintax to force fb to use it?
>
> The question is not about the query optimization (full outer join can be
> omitted, i know), but about why fb performs many reads insetad of 100 +
> 10 cte reads.
>
> Please correct me because i'm sure that i've done something erroneusly.
>
> Best regards,
>
> Agharta
Hi Agharta, thanks for a perfect description of your problem!

Does the performance change if you add another CTE and change the query
like below?

with
T2CTE as (select first 10 * from new_table2 where field1=1),
T1CTE as (select first 100 * from new_table where field1=1),
T3CTE as (select * from T1CTE union select * from T2CTE)

select t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4
from T3CTE t3
left join T1CTE t1
on t3.field1 = t1.field1 and t3.field2 = t1.field2 and t3.field3 = t1.field3 and t3.field4 = t1.field4
left join T2CTE t2
on t3.field1 = t2.field1 and t3.field2 = t2.field2 and t3.field3 = t2.field3 and t3.field4 = t2.field4
left join T1CTE t1x
on t3.field1 = t1x.field1 and t3.field2 = t1x.field2 and t3.field3 = t1x.field3 and t3.field4 = t1x.field4
group by t1.id, t1.FIELD1, t1.FIELD2, t1.FIELD3, t1.FIELD4

I think the result should be identical (though I am slightly uncertain
if there are say three identical rows in both T1CTE and T2CTE), and it
wouldn't surprise me if it was quicker (I think I once did something
similar myself when a query took too long, though I don't remember which
Firebird version I then used). If it works, then I would call it a
workaround, and agree that Firebird is not great for FULL OUTER JOIN on
larger tables.

HTH,
Set

Hi Set,

Many thanks for your reply!

I've tested your query but nope, interrupted after 3 minutes.

Dimitry Sibiryakov answered me that FB does not support Hash joins, this is why the query is slow.

Karol Bieniaszewski answered me that hash joins already have a tracker http://tracker.firebirdsql.org/browse/CORE-4823

Ok, i'll wait the implementation.


Many thanks again.

Best regards,

Agharta