Subject Re: [firebird-support] Efficient update on small subset of LARGE table?
Author Kjell Rilbe
Leyne, Sean skriver:
>
> Kjell,
>
> > I have a table "LargeTable" with ~170 million records.
>
> That's a "reasonable" sized table! ;-)
>
Well, I guess so, but it's large enough to cause severe problems
regarding execution time in many cases.

> > I need to update a single column on a "small" subset (maybe a
> million records or so),
> > but the subset selection requires references to other tables.
>
> A difficult problem.
> How many relates tables are used?
> Would the related tables be first level or second level joins?
> How many entries are there in the tables?
> Are the references FK values?
> How many total unique FKs values would the criteria resolve to?
> Do you have indexes on the update table for the FKs?
>
I guess the best overall answer I can give you is the query and plan for
a select that picks up the PK of the table that I want to update:

select LrgTbl."ECO_ID"
from "LargeTable" LrgTbl
inner join "Uppgiftshållare" UH on UH."ECO_ID" = LrgTbl."Hållare"
inner join "Uppgiftshanterare" Uhant on Uhant."ECO_ID" = UH."Hanterare"
where LrgTbl."ÄrNull" = 0
and UH."Propertynamn" in ('This', 'That', 'Other', 'Extra')
and Uhant."ColX" = 'Whatever';

PLAN JOIN (UHANT INDEX (IX_ColX), UH INDEX (IX_UH_Uhant), LARGETABLE
INDEX (IX_LrgTbl_UH))

Selectivity of IX_ColX = 0.125000
Selectivity of IX_UH_Uhant = 0.000000
Selectivity of IX_LrgTbl_UH = 0.000000

There are no FK:s because the schema is generated by an O/R mapper that
allows circular refs, which would be problematic with FK:s in place. But
all "candidate FK:s" are indexed, as above.

UH contains around 150 million records and UHANT about 10-20 million I
think (I didn't check).

> > Unfortunately, Firebird does not support joins in update or delete
> queries,
> > resulting in natural scan of "LargeTable" and consequently 170 million
> > executions of each correlated subquery (required to form my desired
> > selection).
> >
> > So, is there a better way?
>
> Depends on how you define "better".
>
> At the moment, the best scenario I can see is that you still have a
> natural scan. But that the "cost" of the related tables references
> would be minimized.
>
> > Note: I tried a "straightforward" update with correlated subqueries.
> It took 17
> > hours. I think a natural scan takes about an hour...
>
> I would hope that the new approach would takes 2X natural scan.
>
> How would that performance be?
>

What new approach are you referring to?
2 X time for natural scan would be fair enough. It took 1½ hour to
execute an update using execute block with for select .... do update....
where the select picked PK:s from a temporary table I filled with a
select like the above. It did 2522142 updates.

Regards,
Kjell