Subject | Re: Firebird 2.1.x Indexing |
---|---|
Author | Adam |
Post date | 2008-09-18T23:43:24Z |
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@...> wrote:
true of IN() when used in the context that the caution refers to...i.e.
are thinking about?
SQL> set planonly;
SQL> select firstname from employee where uid in (1);
PLAN (EMPLOYEE INDEX (PK_EMPLOYEE))
It gets a bit more unwieldy when you include multiple values
SQL> select firstname from employee where uid in (1,2,3);
PLAN (EMPLOYEE INDEX (PK_EMPLOYEE,PK_EMPLOYEE,PK_EMPLOYEE))
You don't want to know what it looks like when you get to 1499 records
in the in predicate.
In terms of select statements inside an IN, it converts it to the
equivalent where exists statement (in 1.5 or newer).
SQL> select firstname from employee where uid in (select employeeuid
from employeecontact);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
SQL> select firstname from employee where exists (select employeeuid
from employeecontact where employeeuid = employee.uid);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
Dmitry is right about where Firebird's performance can suffer in this
regard (with either IN or EXISTS predicate). Firebird always starts
with the outer table, and for each record performs the existence test
for *each row*. When the subselect is expensive and the outer is
cheap, Firebird can not (currently) reverse its order like it does for
joins. In the above example, employee.uid is the PK, so it has perfect
selectivity and is therefore very cheap to locate, if the outer select
contained several thousand records, and the inner select contained
just a few but involved many joins to return that, the expensive inner
query has to be run thousands of times, whereas if it could convert it
to the following (nearly) equivalent PSQL
for select employeeuid from employeecontact into :employeeuid do
begin
for select firstname from employee where uid = :employeeuid into
:firstname do
begin
suspend;
end
end
This would mean the expensive inner query only needs execution once
and the cheap outer query is executed once per result for the inner
query. This approach may be more expensive in other queries, so the
optimiser would need to be able to guess which approach is more
helpful for the situation at hand. It would also need to deal with the
possibility of there being multiple employeecontact records for each
employee record (the SP above would return it multiple times).
I don't think it is fair to call such SQL "unsmart". I have seen much
unsmart SQL (and admit to writing my fair share of it), but when you
have an arbitrary list of items you want to query (based on user
feedback or other factors external to the dbms such as sensor
information), you really have three options.
Firstly, you can use IN. Secondly, you can create a complex mix of
BETWEEN and OR statements (with brackets for order of operations).
This can cause less index hits if you expect blocks of consecutive
integers. Thirdly, you can insert the "interesting" record identifiers
into a temporary table then join to it.
Option 3 has only just become available (without a fair bit of hacking
around and self cleanup) in 2.1, and even now there is no easy high
speed way (particularly over slow networks) of bulk inserting the
interesting record identifiers (but I believe that is SQLs fault
rather than Firebird's).
From an implementation perspective, it is preferable to write your
query as an inner join rather than an exists (where that is possible).
NOT IN is a different kettle of fish. Again, IIRC it previously used
indices. In Firebird 1.5,
SQL> select firstname from employee where uid not in (select
employeeuid from employeecontact);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
But in 2.1 (or possibly 2.0?) this was changed because the results
were wrong when there were nulls involved. Obviously these should be
avoided where possible, or only used where the sub select is very
cheap to run without indices.
Adam
>since Fb 1.5 to make sense of "unsmart" SQL, I think this is still
> At 02:13 19/09/2008, you wrote:
>
> >On Page 397, Expressions and Predicates, under Other Comparison
> >Predicators, there is a CAUTION block that starts:
> >
> >[It is a common "newbie" mistake to treat the predicate "IN (<value>)"
> >as if it were equivalent to "= <value>" because the two are logically
> >equivalent, insofar as both return the same result. However, IN()
> >does not use an index. ...]
> >
> >If that's incorrect, I hope it helps you track down the affending
> >characters and give them a good talking too. :)
>
> Actually, although the optimizer has undergone a lot of changes
true of IN() when used in the context that the caution refers to...i.e.
>I am not sure this is the case in FB 1.5.x. Perhaps it was 1.0.x you
> ...where aColumn in ('a', 'b') will use an index
> but
> ......where aColumn in ('a') will not.
are thinking about?
SQL> set planonly;
SQL> select firstname from employee where uid in (1);
PLAN (EMPLOYEE INDEX (PK_EMPLOYEE))
It gets a bit more unwieldy when you include multiple values
SQL> select firstname from employee where uid in (1,2,3);
PLAN (EMPLOYEE INDEX (PK_EMPLOYEE,PK_EMPLOYEE,PK_EMPLOYEE))
You don't want to know what it looks like when you get to 1499 records
in the in predicate.
In terms of select statements inside an IN, it converts it to the
equivalent where exists statement (in 1.5 or newer).
SQL> select firstname from employee where uid in (select employeeuid
from employeecontact);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
SQL> select firstname from employee where exists (select employeeuid
from employeecontact where employeeuid = employee.uid);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
Dmitry is right about where Firebird's performance can suffer in this
regard (with either IN or EXISTS predicate). Firebird always starts
with the outer table, and for each record performs the existence test
for *each row*. When the subselect is expensive and the outer is
cheap, Firebird can not (currently) reverse its order like it does for
joins. In the above example, employee.uid is the PK, so it has perfect
selectivity and is therefore very cheap to locate, if the outer select
contained several thousand records, and the inner select contained
just a few but involved many joins to return that, the expensive inner
query has to be run thousands of times, whereas if it could convert it
to the following (nearly) equivalent PSQL
for select employeeuid from employeecontact into :employeeuid do
begin
for select firstname from employee where uid = :employeeuid into
:firstname do
begin
suspend;
end
end
This would mean the expensive inner query only needs execution once
and the cheap outer query is executed once per result for the inner
query. This approach may be more expensive in other queries, so the
optimiser would need to be able to guess which approach is more
helpful for the situation at hand. It would also need to deal with the
possibility of there being multiple employeecontact records for each
employee record (the SP above would return it multiple times).
I don't think it is fair to call such SQL "unsmart". I have seen much
unsmart SQL (and admit to writing my fair share of it), but when you
have an arbitrary list of items you want to query (based on user
feedback or other factors external to the dbms such as sensor
information), you really have three options.
Firstly, you can use IN. Secondly, you can create a complex mix of
BETWEEN and OR statements (with brackets for order of operations).
This can cause less index hits if you expect blocks of consecutive
integers. Thirdly, you can insert the "interesting" record identifiers
into a temporary table then join to it.
Option 3 has only just become available (without a fair bit of hacking
around and self cleanup) in 2.1, and even now there is no easy high
speed way (particularly over slow networks) of bulk inserting the
interesting record identifiers (but I believe that is SQLs fault
rather than Firebird's).
From an implementation perspective, it is preferable to write your
query as an inner join rather than an exists (where that is possible).
NOT IN is a different kettle of fish. Again, IIRC it previously used
indices. In Firebird 1.5,
SQL> select firstname from employee where uid not in (select
employeeuid from employeecontact);
PLAN (EMPLOYEECONTACT INDEX (PK_EMPLOYEECONTACT))
PLAN (EMPLOYEE NATURAL)
But in 2.1 (or possibly 2.0?) this was changed because the results
were wrong when there were nulls involved. Obviously these should be
avoided where possible, or only used where the sub select is very
cheap to run without indices.
Adam