Subject Re: [firebird-support] Cascades on very large databases
Author Ann W. Harrison
sqlsvr wrote:

> How is firebird cascade performance on a very large database (say 100gb)
> with lots of relations (especially when using natural keys)? I'm assuming
> it would take hours/days and probably block all other queries. If firebird
> used internal pointers for the foreign keys, it would be instant instead of
> hours/days.

Yes, if you create a schema on a 100Gb database so that updating the
master record will cascade to update every record in the database it
will take a while - hours, I'd guess not days. Whether the keys are
natural or artificial doesn't make much difference - except that you
very rarely see massive changes to artificial keys. All the changes
are indexed lookups, so relatively cheap.

Readers would not be affected. They would continue to see the state
of the database before the cascading update. Read performance would
be affected to some degree. Since Firebird doesn't allow concurrent
transactions to change the same record, any update or delete that
affects a record being cascaded would wait, then fail if the cascaded
change commits. Inserts would wait until the final state of the
referenced record was determined. If one of the cascaded records
was changed by a concurrent transaction before the cascade got to it,
the operation that caused the cascade would fail and be undone.

Large cascading deletes are possible, though scary. Regardless of
the model - pointers or stored values - you've still got to get rid
of the deleted records and any index entries related to them. So
the projected saving would be on updates.

I doubt that the use of internal pointers for foreign keys would make
updates instantaneous unless there was a single value that cascaded
through the whole database. You could construct a case (badly) where
there's a referenced table with - for example the name of the database
owner and every other table references that. In Firebird, you'd have a
worthless index on every table with that one value in it, and every
record would have that value. In the pointer case, all those records
could change by changing one value. But why would you do that, except
to demonstrate that pointers are good in a worthless case?

Multiple-level cascading updates are rare, because at each level
the referenced value must be unique. If records in two tables have
the same natural unique key, then the data definition is not
normalized.

A more reasonable case is a master table that cascades to most
other tables. For example, you might have a master (referenced)
table of state codes and state names and child tables (referencing)
that contain the state codes. Customers, orders, and warehouses
could all reference the state code table. Changing the code for
California from CA to CF might cascade to a large number of records.
In Firebird, you'd have to change both the data in the record and
the indexes that reference it. With pointers, you would need indexes
on the state code, I would guess, so you could find customers from
California.

>
> How would you delete/update a record that would cause lots of cascades?

With a SQL DML statement.


Good luck,

Ann