Subject AW: [firebird-support] Re: select in (subquery)
Author Alexander Gräf
> -----Ursprüngliche Nachricht-----
> Von: Svein Erling Tysvær
> [mailto:svein.erling.tysvaer@...]
> Gesendet: Mittwoch, 1. Dezember 2004 10:18
> An: firebird-support@yahoogroups.com
> Betreff: [firebird-support] Re: select in (subquery)
>
> Hence, it is
> often slow if that is your only limiting criterium on a big
> table, whereas it is brilliant when you have additional
> (indexed) criteria limiting the result set.
>
> Set
>

As I have written:

> They are simply fast when you don't retrieve too
> many records, or if they are static.

Many subselects can be rewritten as joins and vice versa. Most of the time, joins are much faster. This is no empiric analysis, but my very own experience. I use subselects only as a substitution for aggregate functions, which are the only thing, which has even worse performance. The typical question "How many items do all customers have ordered" can be written as join or as subselect:

SELECT Customers.ID, Customers.Lastname, Customers.Firstname, COUNT(*) FROM Customers
LEFT OUTER JOIN Orders ON Customers.ID = Orders.CustomerID
GROUP BY Customers.ID, Customers.Lastname, Customers.Firstname;

In tutorials for SQL, thats a typical example, but with FireBird this is simply slow (if the table is filled with real-world data). One big problem here are the columns for GROUP BY. The same query can be written the following in some other databases:

SELECT Customers.*, COUNT(*) FROM Customers
LEFT OUTER JOIN Orders ON Customers.ID = Orders.CustomerID
GROUP BY Customers.ID;

Depending on the count of columns to retrieve, this can get several times faster than the first approach. However, simply using a subquery, and the whole thing gets a magnitude faster:

SELECT Customers.*,
(SELECT COUNT(*) FROM Orders WHERE Customers.ID = Orders.CustomerID)
FROM Customers;

However, one can combine all bad things together: Using a join with GROUP BY and a subselect, which calls an SP with one of the cols from the outer tables. If one would look into the database, he could see the database calling the SP 17 mil. times (for example). The whole thing get hundred times faster if taking the SP out of the subselect, and using it as the first table in the join, because it only get evaluated once.

Regards, Alex