firebird-support - Very bad performance when joining two big tables.

Subject	Very bad performance when joining two big tables.
Author	un_spoken
Post date	2012-12-12T13:15:29Z

Hi guys.

Database in one of our clients has reached the point where it is no longer possible to get results in some reasonable time.

In the database we have two tables: TASK - around 40 million of rows nad LOCATION around 20 million of rows.

This is the query:

SELECT
T.ID_TASK, L.LATITUDE, L.LONGITUDE
FROM
TASK T
INNER JOIN
LOCATION L
ON
T.IDLOCATION = L.ID_LOCATION
WHERE
T.IDSHEDULE IN (8169797)

The above query is returning 1900 rows.

When I run the query for the first time:

Starting transaction...
Preparing statement: SELECT T.ID_TASK, L.LATITUDE, L.LONGITUDE
FROM TASK T
INNER JOIN
LOCATION L
ON
T.IDLOCATION = L.ID_LOCATION
WHERE T.IDSHEDULE IN (8169797 )
Statement prepared (elapsed time: 4.212s).
Field #01: TASK.ID_TASK Alias:ID_TASK Type:INTEGER
Field #02: LOCATION.LATITUDE Alias:LATITUDE Type:INTEGER
Field #03: LOCATION.LONGITUDE Alias:LONGITUDE Type:INTEGER
PLAN JOIN (T INDEX (IDX_TASK_IDSHEDULE), L INDEX (PK_LOCATION))

Executing statement...
Statement executed (elapsed time: 0.000s).
58265 fetches, 8 marks, 2524 reads, 8 writes.
0 inserts, 0 updates, 0 deletes, 4837 index, 18924 seq.
Delta memory: 75868 bytes.
Total execution time: 21.262s
Script execution finished.

When I run the query for the second time:

Starting transaction...
Preparing statement: SELECT T.ID_TASK, L.LATITUDE, L.LONGITUDE
FROM TASK T
INNER JOIN
LOCATION L
ON
T.IDLOCATION = L.ID_LOCATION
WHERE T.IDSHEDULE IN (8169797 )
Statement prepared (elapsed time: 0.000s).
Field #01: TASK.ID_TASK Alias:ID_TASK Type:INTEGER
Field #02: LOCATION.LATITUDE Alias:LATITUDE Type:INTEGER
Field #03: LOCATION.LONGITUDE Alias:LONGITUDE Type:INTEGER
PLAN JOIN (T INDEX (IDX_TASK_IDSHEDULE), L INDEX (PK_LOCATION))

Executing statement...
Statement executed (elapsed time: 0.000s).
16288 fetches, 0 marks, 2441 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 3997 index, 0 seq.
Delta memory: 23680 bytes.
Total execution time: 0.109s
Script execution finished.

As you see, when I run query for the second time it executes over 200 times faster. I already know the reason: pages have been cached into the memory. But the first execution is crucial for me.

The "funny" thing is that if I run gstat for table LOCATION, it will consume 80% of available memory but then queries for different IDSHEDULE will run at decent speed.

I've tried to increase the page size to 16384 it did not help.
I've tried to rebuild index PK_LOCATION but that did not help. And please take a look at its fill distribution - is it ok?

What else can I do? I can not add RAM or change any hardware.

Here is the info for both used tables:

Database "E:\TESTS_2\#3635 - Prototyp drzewa\Baza\7_16384.FDB"
Database header page information:
Flags 0
Checksum 12345
Generation 43960
Page size 16384
ODS version 11.2
Oldest transaction 42400
Oldest active 42401
Oldest snapshot 42401
Next transaction 42420
Bumped transaction 1
Sequence number 0
Next attachment ID 1534
Implementation ID 26
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Dec 10, 2012 9:43:49
Attributes force write

Variable header data:
Sweep interval: 20000
*END*

Database file sequence:
File E:\TESTS_2\#3635 - Prototyp drzewa\Baza\7_16384.FDB is the only file

Analyzing database pages ...
LOCATION (224)
Primary pointer page: 428, Index root page: 429
Data pages: 267028, data page slots: 267028, average fill: 89%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 267027

Index IDX_LOCATION2 (1)
Depth: 3, leaf buckets: 7156, nodes: 20220368
Average data length: 0.00, total dup: 20220366, max dup: 19953349
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 7155

Index IDX_LOCATION_IDCL_PCSH (6)
Depth: 3, leaf buckets: 45454, nodes: 20220368
Average data length: 29.91, total dup: 6982193, max dup: 245
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 4
60 - 79% = 1
80 - 99% = 45449

Index IDX_LOCATION_LATLON (7)
Depth: 3, leaf buckets: 7664, nodes: 20220368
Average data length: 0.27, total dup: 19103978, max dup: 920366
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 86
60 - 79% = 1
80 - 99% = 7577

Index IDX_LOCATION_RA_P_C (5)
Depth: 3, leaf buckets: 7452, nodes: 20220368
Average data length: 0.19, total dup: 19897684, max dup: 117558
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 64
60 - 79% = 0
80 - 99% = 7387

Index IDX_LOCATION_RT (4)
Depth: 3, leaf buckets: 7842, nodes: 20220368
Average data length: 0.51, total dup: 10623192, max dup: 71500
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 7841

Index LOCATION_ID_LOCATION_A (2)
Depth: 3, leaf buckets: 8418, nodes: 20220368
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 8417

Index LOCATION_ID_LOCATION_D (3)
Depth: 3, leaf buckets: 8628, nodes: 20220368
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 2
60 - 79% = 0
80 - 99% = 8625

Index PK_LOCATION (0)
Depth: 3, leaf buckets: 8418, nodes: 20220368
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 8417

Database "E:\TESTS_2\#3635 - Prototyp drzewa\Baza\7_16384.FDB"
Database header page information:
Flags 0
Checksum 12345
Generation 42108
Page size 16384
ODS version 11.2
Oldest transaction 40413
Oldest active 40414
Oldest snapshot 40414
Next transaction 40581
Bumped transaction 1
Sequence number 0
Next attachment ID 1521
Implementation ID 26
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Dec 10, 2012 9:43:49
Attributes force write

Variable header data:
Sweep interval: 20000
*END*

Database file sequence:
File E:\TESTS_2\#3635 - Prototyp drzewa\Baza\7_16384.FDB is the only file

Analyzing database pages ...
TASK (205)
Primary pointer page: 390, Index root page: 391
Data pages: 1018464, data page slots: 1018464, average fill: 91%
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 0
60 - 79% = 0
80 - 99% = 1018463

Index IDX_IDS_DATE_RA (14)
Depth: 3, leaf buckets: 27335, nodes: 73798656
Average data length: 0.04, total dup: 73497999, max dup: 3418171
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 6
60 - 79% = 1
80 - 99% = 27328

Index IDX_TASK_DODATE_NUMBER_RA (16)
Depth: 3, leaf buckets: 27138, nodes: 73798652
Average data length: 0.03, total dup: 71943618, max dup: 4245544
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 3
60 - 79% = 0
80 - 99% = 27134

Index IDX_TASK_IDCONTRACT (11)
Depth: 3, leaf buckets: 26988, nodes: 73798652
Average data length: 0.00, total dup: 73798651, max dup: 73798651
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 26987

Index IDX_TASK_IDRFID (12)
Depth: 3, leaf buckets: 26988, nodes: 73798652
Average data length: 0.00, total dup: 73798651, max dup: 73798651
Fill distribution:
0 - 19% = 0
20 - 39% = 1
40 - 59% = 0
60 - 79% = 0
80 - 99% = 26987

Index IDX_TASK_IDSHEDULE (10)
Depth: 3, leaf buckets: 27035, nodes: 73798656
Average data length: 0.01, total dup: 73554966, max dup: 14657217
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 6
60 - 79% = 0
80 - 99% = 27028

Index IDX_TASK_RA_IDC_NN (9)
Depth: 3, leaf buckets: 33234, nodes: 73798652
Average data length: 0.95, total dup: 56969131, max dup: 1848518
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 2220
60 - 79% = 0
80 - 99% = 31013

Index IDX_TASK_RFID_CODE (13)
Depth: 3, leaf buckets: 27018, nodes: 73798652
Average data length: 0.00, total dup: 73798651, max dup: 73798651
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 27017

Index PK_TASK (0)
Depth: 3, leaf buckets: 31607, nodes: 73798652
Average data length: 1.00, total dup: 0, max dup: 0
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 1
60 - 79% = 0
80 - 99% = 31606

Index TASK_DODATE_A (1)
Depth: 3, leaf buckets: 27010, nodes: 73798652
Average data length: 0.00, total dup: 73798242, max dup: 4512545
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 3
60 - 79% = 0
80 - 99% = 27006

Index TASK_DODATE_D (2)
Depth: 3, leaf buckets: 27011, nodes: 73798652
Average data length: 0.00, total dup: 73798242, max dup: 4512545
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 6
60 - 79% = 1
80 - 99% = 27004

Index TASK_IDCLIENT_A (3)
Depth: 3, leaf buckets: 28885, nodes: 73798652
Average data length: 0.16, total dup: 62369870, max dup: 2002164
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 2282
60 - 79% = 0
80 - 99% = 26603

Index TASK_IDCLIENT_D (4)
Depth: 3, leaf buckets: 29006, nodes: 73798652
Average data length: 0.16, total dup: 62369870, max dup: 2002164
Fill distribution:
0 - 19% = 1
20 - 39% = 0
40 - 59% = 2266
60 - 79% = 0
80 - 99% = 26739

Index TASK_IDLOCATION_A (5)
Depth: 3, leaf buckets: 28891, nodes: 73798652
Average data length: 0.16, total dup: 62367812, max dup: 2015520
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 2290
60 - 79% = 0
80 - 99% = 26601

Index TASK_IDLOCATION_D (6)
Depth: 3, leaf buckets: 28987, nodes: 73798652
Average data length: 0.16, total dup: 62367812, max dup: 2015520
Fill distribution:
0 - 19% = 2
20 - 39% = 0
40 - 59% = 2228
60 - 79% = 0
80 - 99% = 26757

Index TASK_OUTNUMBER_RA (15)
Depth: 3, leaf buckets: 33245, nodes: 73798652
Average data length: 0.96, total dup: 48983781, max dup: 2706647
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 666
60 - 79% = 9
80 - 99% = 32570

Index TASK_RECORD_ACTIVE_A (7)
Depth: 3, leaf buckets: 26996, nodes: 73798652
Average data length: 0.00, total dup: 73798650, max dup: 56595571
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 2
60 - 79% = 0
80 - 99% = 26994

Index TASK_RECORD_ACTIVE_D (8)
Depth: 3, leaf buckets: 26998, nodes: 73798652
Average data length: 0.00, total dup: 73798650, max dup: 56595571
Fill distribution:
0 - 19% = 0
20 - 39% = 0
40 - 59% = 6
60 - 79% = 0
80 - 99% = 26992

Thank you for your time and any advice.