Subject | RE: [firebird-support] Re: Query appears to put Firebird into a loop |
---|---|
Author | Svein Erling Tysvær |
Post date | 2008-02-27T07:53:41Z |
OK Stephen, then I think I know the source of your problem - the combination of your Firebird version and your compound LD_LOAD_NUMBER_KEY.
Why? First, bear in mind that Firebird 1.5.3 is capable to use part of a compound key, but not capable of knowing the selectivity of each part of the key individually. Then look at your query and plan again:
So, your assumption that E is using the unique index may be true, but not quite correct, it is only using a small, unselective, part of the index. My simple, stupid logic that doesn't quite agree with how Firebird does things, would expect (assuming 50% of the records having 'N' in the important field, and each B record linking to one A record): 100K*0.5*40K*0.5 = 1000000000 (can that be called 1G?). That should explain why things are slow even though C and D use their primary index.
How to fix it? Well, you have a few choices, I don't know which to prefer. I tend to index all appropriate fields individually myself, rather than use compound indexes, and I don't know if I've ever used compound keys (using compound keys sounds sensible, I just lack experience). You could try changing the order of the fields in your key to avoid situations like this, or you could force E back in your plan (when it can use all fields) by using LEFT JOIN and move LD_TEMPLATE_FLAG down to the WHERE clause (this will turn the LEFT JOIN into an INNER JOIN, with the exception that Firebird cannot move E back up in the PLAN). There are other things you could try as well, but these two options would probably be my first attempts if I were in your situation.
The good part is that you've shown us a brilliant example how the optimizers lack of knowledge of each part of compound indexes in Firebird 1.5 can be disastrous, and I hope I have been able to explain that a bit above.
HTH,
Set
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Stephen Boyd
Sent: 27. februar 2008 00:08
To: firebird-support@yahoogroups.com
Subject: [firebird-support] Re: Query appears to put Firebird into a loop
LDS_TEMPLATE_FLAG, LDS_LOCATION, LDS_LOAD_NUMBER, LDS_LOAD_SFX AND
LDS_STOP_NUMBER. There will never be more that 20 rows with different
stop numbers for each load. So it is quite selective.
LD_TEMPLATE_FLAG, LD_LOAD_NUMBER and LD_SUFFIX. It is highly selective.
The only part of the query that should require a scan of a large
number of records in the stuff in the WHERE clause. That would have
to scan 40K records to find the 2K that I am interested in. None of
the JOINs should be reading more that 20 rows to get the rows that I
am interested in.
LD_STOP = 100K rows
MF_STOP_STOPS = 20K rows
I find it curious that everything is fine until I JOIN the query back
to the LOADS table which, according to the plan, is using a unique key
to find the load records.
Why? First, bear in mind that Firebird 1.5.3 is capable to use part of a compound key, but not capable of knowing the selectivity of each part of the key individually. Then look at your query and plan again:
> SELECT A.LD_LOCATION, A.LD_LOAD_NUMBER, A.LD_LOAD_SUFFIX, A.LD_LH_MILES,First, the index LDS_LOAD_KEY is used on B. At this point, the only part of that key that can be used is B.LDS_TEMPLATE_FLAG = 'N' (A isn't in your plan yet, hence you have nothing to compare the other 'B.<fields>' to). As I said in my first post, unless B.LDS_TEMPLATE_FLAG = 'N' is selective, then it would be better to have used NATURAL, but it doesn't have too visible effect since it is the first alias in your PLAN (the entire index may be unique, but not this field by itself). The next part of the plan is A(LD_LOAD_NUMBER_KEY), which seems a great index. The values of B are at this point known, so all fields of the index can be used. But then, disaster strucks when using the same index for E(LD_LOAD_NUMBER_KEY)!!! D isn't in the plan yet, so the only part of the index that Firebird uses is for E.LD_TEMPLATE_FLAG = 'N'! This doesn't sound selective at all, and since it is far from the first alias in your plan, it may wreck performance completely! The main problem is what I wrote above, that Firebird only knows the complete selectivity of the index, not its individual parts. Hence, Firebird thinks that the index will be equally selective when only comparing the first field of the index, as it would have been if it had been comparing all parts of the index! (I must say that I don't know whether this is completely true, it may be that it thinks each part is equally important, and that it is 1/3 as selective, but it will not know that LD_TEMPLATE_FLAG is less selective than LD_LOAD_NUMBER).
> E.LD_LOCATION AS LOAD_LOC, E.LD_LOAD_NUMBER AS LOAD_NUM,
> E.LD_LOAD_SUFFIX AS LOAD_SFX, E.LD_NUM_STOPS, E.LD_TTL_REVENUE,
> E.LD_TTL_ASS_REVENUE, E.LD_TTL_CART_AMOUNT
> FROM LOADS A
> INNER JOIN LD_STOP B ON B.LDS_TEMPLATE_FLAG = 'N' AND
> B.LDS_LOCATION = A.LD_LOCATION AND
> B.LDS_LOAD_NUMBER = A.LD_LOAD_NUMBER AND
> B.LDS_LOAD_SFX = A.LD_LOAD_SUFFIX
> INNER JOIN MF_STOP_STOPS C ON C.MFSS_STOP_ID = B.LDS_STOP_ID
> INNER JOIN LD_STOP D ON D.LDS_STOP_ID = C.MFSS_LDS_STOP_ID
> INNER JOIN LOADS E ON E.LD_TEMPLATE_FLAG = 'N' AND
> E.LD_LOCATION = D.LDS_LOCATION AND
> E.LD_LOAD_NUMBER = D.LDS_LOAD_NUMBER AND
> E.LD_LOAD_SUFFIX = D.LDS_LOAD_SFX
> WHERE A.LD_TEMPLATE_FLAG = 'N' AND
> A.LD_MANIFEST_FLAG <> ' '
> ORDER BY A.LD_TEMPLATE_FLAG, A.LD_LOCATION, A.LD_LOAD_NUMBER,
> A.LD_LOAD_SUFFIX
>
> The plan shows that indices are being used for all JOINS:
>
> PLAN SORT (JOIN (B INDEX (LDS_LOAD_KEY),A INDEX (LD_LOAD_NUMBER_KEY),E
> INDEX (LD_LOAD_NUMBER_KEY),C INDEX (RDB$PRIMARY87),D INDEX
> (RDB$PRIMARY39)))
So, your assumption that E is using the unique index may be true, but not quite correct, it is only using a small, unselective, part of the index. My simple, stupid logic that doesn't quite agree with how Firebird does things, would expect (assuming 50% of the records having 'N' in the important field, and each B record linking to one A record): 100K*0.5*40K*0.5 = 1000000000 (can that be called 1G?). That should explain why things are slow even though C and D use their primary index.
How to fix it? Well, you have a few choices, I don't know which to prefer. I tend to index all appropriate fields individually myself, rather than use compound indexes, and I don't know if I've ever used compound keys (using compound keys sounds sensible, I just lack experience). You could try changing the order of the fields in your key to avoid situations like this, or you could force E back in your plan (when it can use all fields) by using LEFT JOIN and move LD_TEMPLATE_FLAG down to the WHERE clause (this will turn the LEFT JOIN into an INNER JOIN, with the exception that Firebird cannot move E back up in the PLAN). There are other things you could try as well, but these two options would probably be my first attempts if I were in your situation.
The good part is that you've shown us a brilliant example how the optimizers lack of knowledge of each part of compound indexes in Firebird 1.5 can be disastrous, and I hope I have been able to explain that a bit above.
HTH,
Set
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Stephen Boyd
Sent: 27. februar 2008 00:08
To: firebird-support@yahoogroups.com
Subject: [firebird-support] Re: Query appears to put Firebird into a loop
> All I can guess (so far), is that LDS_LOAD_KEY is not selective at allLDS_LOAD_KEY is a unique, compound key consisting of
> in your query
LDS_TEMPLATE_FLAG, LDS_LOCATION, LDS_LOAD_NUMBER, LDS_LOAD_SFX AND
LDS_STOP_NUMBER. There will never be more that 20 rows with different
stop numbers for each load. So it is quite selective.
> that leaves LD_LOAD_NUMBER_KEY, whichLD_LOAD_NUMBER key is a unique, compound key consisting of
> I assume to have poor selectivity since the query is time consuming.
>
LD_TEMPLATE_FLAG, LD_LOAD_NUMBER and LD_SUFFIX. It is highly selective.
The only part of the query that should require a scan of a large
number of records in the stuff in the WHERE clause. That would have
to scan 40K records to find the 2K that I am interested in. None of
the JOINs should be reading more that 20 rows to get the rows that I
am interested in.
> and I also like to have a general ideaLOADS = 40K rows
> about the size of each table involved in the query.
>
LD_STOP = 100K rows
MF_STOP_STOPS = 20K rows
I find it curious that everything is fine until I JOIN the query back
to the LOADS table which, according to the plan, is using a unique key
to find the load records.