Subject | Re: SELECT FIRST in subselects |
---|---|
Author | Svein Erling Tysvær |
Post date | 2005-06-23T07:07:40Z |
Hello Bjoern!
First of all, I'd say this to me sounds like a bug (and I think I've
seen similar things on this list before). Secondly, using SELECT FIRST
and ORDER BY in a subselect is something I would avoid if at all
possible (I'm not certain whether I think Firebird should give you the
result you want or whether it should report an error, but ignoring the
IN clause doesn't sound right). Only tables with very few records with
AENDERUNG and DATUM would give acceptable speed with your query if it
worked. The reason being that the subselects would be executed for
every record and the sorting done after having grabbed all the
records. Hence, I would expect a table with 100 records of which 40
were AENDERUNG and 10 DATUM to process 4980 records if things worked
OK (the two records matching AENDERUNG doesn't have to check DATUM,
hence 20 less lookups than retrieving the 50 records 100 times). Now,
this is only my simplified understanding of Firebird which is
approximately as correct as 3-year-old kids' understanding of the
ocean to be big, but it is good enough for warning bells to sound when
seeing queries like yours.
Possible rewrites of your query include
SELECT * FROM DBINFO I
WHERE
(I."Label" = 'AENDERUNG' AND
NOT EXISTS(SELECT * FROM DBINFO I1
JOIN DBINFO I3 ON I1."Label" = I3."Label"
WHERE I1."Label" = I."Label"
AND I1."CreatedAt" > I."CreatedAt"
AND I3."CreatedAt" > I1."CreatedAt"))
OR
(I."Label" = 'DATUM' AND
NOT EXISTS(SELECT * FROM DBINFO I2
JOIN DBINFO I4 ON I2."Label" = I4."Label"
WHERE I2."Label" = I."Label"
AND I2."CreatedAt" > I."CreatedAt"
AND I4."CreatedAt" > I2."CreatedAt"))
(kids' count: subselects 40*2 + 10*2 - 2 = 98 + 100 checks on "Label"
= 198. Far better than 4980!)
or you could possibly use "subselect count" (I've never tried this, so
I'm not 100% certain it will work). This will be slower than the above
and to a large extent suffer the same speed drawbacks as your original
query (wow, the ocean IS really big):
SELECT * FROM DBINFO I
WHERE
(I."Label" = 'AENDERUNG' AND
(SELECT count(*) FROM DBINFO I1
WHERE I1."Label" = I."Label"
AND I1."CreatedAt" > I."CreatedAt") < 2)
OR
(I."Label" = 'DATUM' AND
(SELECT count(*) FROM DBINFO I2
WHERE I2."Label" = I."Label"
AND I2."CreatedAt" > I."CreatedAt") < 2)
HTH,
Set
First of all, I'd say this to me sounds like a bug (and I think I've
seen similar things on this list before). Secondly, using SELECT FIRST
and ORDER BY in a subselect is something I would avoid if at all
possible (I'm not certain whether I think Firebird should give you the
result you want or whether it should report an error, but ignoring the
IN clause doesn't sound right). Only tables with very few records with
AENDERUNG and DATUM would give acceptable speed with your query if it
worked. The reason being that the subselects would be executed for
every record and the sorting done after having grabbed all the
records. Hence, I would expect a table with 100 records of which 40
were AENDERUNG and 10 DATUM to process 4980 records if things worked
OK (the two records matching AENDERUNG doesn't have to check DATUM,
hence 20 less lookups than retrieving the 50 records 100 times). Now,
this is only my simplified understanding of Firebird which is
approximately as correct as 3-year-old kids' understanding of the
ocean to be big, but it is good enough for warning bells to sound when
seeing queries like yours.
Possible rewrites of your query include
SELECT * FROM DBINFO I
WHERE
(I."Label" = 'AENDERUNG' AND
NOT EXISTS(SELECT * FROM DBINFO I1
JOIN DBINFO I3 ON I1."Label" = I3."Label"
WHERE I1."Label" = I."Label"
AND I1."CreatedAt" > I."CreatedAt"
AND I3."CreatedAt" > I1."CreatedAt"))
OR
(I."Label" = 'DATUM' AND
NOT EXISTS(SELECT * FROM DBINFO I2
JOIN DBINFO I4 ON I2."Label" = I4."Label"
WHERE I2."Label" = I."Label"
AND I2."CreatedAt" > I."CreatedAt"
AND I4."CreatedAt" > I2."CreatedAt"))
(kids' count: subselects 40*2 + 10*2 - 2 = 98 + 100 checks on "Label"
= 198. Far better than 4980!)
or you could possibly use "subselect count" (I've never tried this, so
I'm not 100% certain it will work). This will be slower than the above
and to a large extent suffer the same speed drawbacks as your original
query (wow, the ocean IS really big):
SELECT * FROM DBINFO I
WHERE
(I."Label" = 'AENDERUNG' AND
(SELECT count(*) FROM DBINFO I1
WHERE I1."Label" = I."Label"
AND I1."CreatedAt" > I."CreatedAt") < 2)
OR
(I."Label" = 'DATUM' AND
(SELECT count(*) FROM DBINFO I2
WHERE I2."Label" = I."Label"
AND I2."CreatedAt" > I."CreatedAt") < 2)
HTH,
Set
--- In firebird-support@yahoogroups.com, Bjoern Reimer wrote:
>
> The following query gives me too much records.
> I'd expect four records.
>
> SELECT * FROM DBINFO I
> WHERE
> I."Id" in (SELECT FIRST 2 I1."Id" FROM DBINFO I1
> WHERE I1."Label" = 'AENDERUNG' ORDER BY I1."CreatedAt" DESC)
> OR
> I."Id" in (SELECT FIRST 2 I2."Id" FROM DBINFO I2
> WHERE I2."Label" = 'DATUM' ORDER BY I2."CreatedAt" DESC)
>
> What I get is
>
> SELECT * FROM DBINFO I
> WHERE I."Label" = 'AENDERUNG' OR I."Label" = 'DATUM'
>
> Where is my mistake?