Subject Re: [firebird-support] Performance problem
Author Helen Borrie
At 04:18 AM 1/11/2007, you wrote:
>Hi All -
>
>I have a scientific data set (Jim Hansen's global temperature anomaly
>grid) which I am trying to do a plot with. When I run a query to
>generate the plot data from an embedded Firebird 2.0.3 database, it
>takes about 25 minutes:
>
> SELECT ((CASE WHEN (F_INTEGERABS("LandOceanTemps"."Lat (deg)") >
>CAST(66.5 AS DOUBLE PRECISION)) THEN _UNICODE_FSS 'Polar'
> WHEN (F_INTEGERABS("LandOceanTemps"."Lat (deg)") <
>CAST(23.5 AS DOUBLE PRECISION)) THEN _UNICODE_FSS 'Tropical'
> ELSE _UNICODE_FSS 'Temperate' END)) AS
>"none:Climate Zone:nk",
> ((CASE WHEN ("LandOceanTemps"."Lat (deg)" >= 0) THEN
>_UNICODE_FSS 'Northern' ELSE _UNICODE_FSS 'Southern' END)) AS
>"none:Hemisphere:nk",
> (EXTRACT(YEAR FROM "LandOceanTemps"."Date")) AS "yr:Date:qk",
> (AVG(CAST("LandOceanTemps"."Anomaly" AS DOUBLE PRECISION))) AS
>"avg:Anomaly:qk"
> FROM "LandOceanTemps"
> GROUP BY 1,
> 2,
> 3
>
>The query plan is.
>
>PLAN SORT ((LandOceanTemps NATURAL))
>
>The same query against a MySQL database containing the original data
>runs in about 47 seconds:
>
> SELECT ((CASE WHEN (ABS(`LandOceanTemps`.`Lat (deg)`) > 66.5)
>THEN 'Polar'
> WHEN (ABS(`LandOceanTemps`.`Lat
>(deg)`) < 23.5) THEN 'Tropical'
> ELSE 'Temperate' END)) AS
>`none:Climate Zone:nk`,
> ((CASE WHEN (`LandOceanTemps`.`Lat (deg)` >= 0) THEN
>'Northern' ELSE 'Southern' END)) AS `none:Hemisphere:nk`,
> (YEAR(`LandOceanTemps`.`Date`)) AS `yr:Date:qk`,
> (AVG(`LandOceanTemps`.`Anomaly`)) AS `avg:Anomaly:qk`
> FROM `LandOceanTemps`
> GROUP BY 1,
> 2,
> 3
>
>The Firebird schema is
>
>CREATE TABLE "LandOceanTemps" (
> "Anomaly" DOUBLE PRECISION,
> "Date" DATE,
> "Lat (deg)" INTEGER,
> "Long (deg)" INTEGER
> );
>
>CREATE INDEX "_tidx_128_2a" ON "LandOceanTemps"."Date" ASC;
>CREATE INDEX "_tidx_128_3a" ON "LandOceanTemps"."Lat (deg)" ASC;
>CREATE INDEX "_tidx_128_4a" ON "LandOceanTemps"."Long (deg)" ASC;
>
>The table has 10,886,400 rows. The dates are all month precision;
>lat/long are on a 2 degree grid and the Anomaly values range from
>-11.7 to 14.09.
>
>Any ideas on why this is 50x slower? It is just numbers!

It's not "just numbers". It's 10 million rows consisting entirely of calculated fields, arriving at a sort set for which no indexes are available. The sort isn't on numbers, either, it's on derived text of up to 60 bytes long, which has to be recalculated repeatedly in the 3-level sort. The AVG calculation itself isn't even straightforward, thanks to the cast. The sort set is too large to be in memory so it's being done on disk in files.

All that's why the Firebird query isn't fast.

As to why the MySQL query could be 50X faster, the only clue I can see is that, in that query, you are not converting those calculated text fields to unicode, i.e., that's two important places where there's a lot less work to do. As a reality check, you could run the Firebird query without the unicode conversion and see what you win.

./heLen