Subject RE: [firebird-support] Rdb$db_key inside triggers
Author Helen Borrie
At 08:52 AM 24/05/2005 +1000, you wrote:
>
> >
> > OK.
> >
> > > The delta table is inspected on a regular basis, and
> > Delta's are then
> > >
> >
> > A problem with using the db_key - aside from the fact that it
> > doesn't seem to do what you need - is that all versions of a
> > record share a single db_key value.
>
>At first, this shocked me, but it makes sense.
>The version of the record doesn't matter too much, as I have the delta's
>that have changed it.
>But, as db_key is misbehaving, a rethink is in order.

It's always been a false assumption to treat rdb$dbkey as though it were a
perpetual, logical record ID (equivalent to the permanent record IDs that
some other DBMSs store). It's a "hidden feature", as long as you fully
understand that it is transient, has a physical context that is specific to
time and place and exists for the benefit of the engine. It can't be
treated as a permanent record locator outside of its immediate context.

Inside a specific transaction, there are tips and techniques whereby you
can make dependable use of the key in cursor operations on sets: take a
look at Claudio Valderrama's writings on the subject (www.cvalde.net).

Wasn't it a replication system that this was targeted at? When the scope
of unique keys extends beyond the boundaries of a single database, there's
no way you can transplant the physical location of a record in one database
into another. In relational databases, everything gets down to the logical
correspondence of stable keys. Across boundaries, nothing else will do.

>Yep. Plenty of unique identifiers there. Luckily, only a few pages of code
>has been done so far, so any changes will be swift.
>But if db_key isn't reliably caught by the trigger, I'll have to build a
>system to match the normal primary key on each table, meaning lots more
>fields, and slower searching.

Not necessarily. If the keys are well thought out, in terms of the
searching activities desired for them, searching on them won't be worse
than on any other "thin" unique locator. Where searches do appear slower
than they should be, it frequently turns out to be one or more of three
things that have been dragged into the design from legacy systems:

1) No suitable indexes defined at all
2) Hierarchical keys (the usual reason why those legacy DBMSs needed unique
record locators, after all!)
3) "Bad" indexing - e.g. the well-known foreign key trap and other
poorly-selective indexes; unnecessary (often complex) indexes imported
from file-based systems; etc.

>Alan's got something to look at over at meta.com.au, but I don't know
>pascal, so it's a hard slog figuring out if he uses triggers or not...

I haven't looked at that particular one, either. Replication systems
typically store atomic replication keys (using generators) for each table
being replicated. The replication tables don't store user data, only a
unique key structure consisting of the PK of the replication record (a
source id + a generated replication ID) and the PK of the table being
replicated, constrained by a UNIQUE constraint.

There are triggers on both sides of the replication structure that try to
find a replication record matching the PK of the record that is to be
replicated. If there is, the replication record refers to that replication
id (source id + local generated value). If not, the trigger creates a new
replication record for that key, using the local source id + a new
generated value.

That's the "infrastructure" for the replication. How the actual data for
the replication is retrieved for the cross-database transaction is a matter
of implementation.

The IBO replication components, for example, perform "live" replication: a
transient table stores the replication IDs and a flag indicating whether
the replication operation is an insert, an update or a delete. If the
transient table already has a previous operation stored for a particular
record, it is resolved so that the "latest state" of the replicatING
database is all that is available to pass to the database being replicated
TO. At replication time, the source record is queried and the latest data
passed across to the target.

As an example of the effect of that, if a record is inserted in the source
database and then deleted before the next replication to the target, then
the target never knows that record ever existed.

It's a simple replication model - "one-way" (asynchronous)
replication. More complex models perform synchronous two-way replication
that is highly dependent on accurate, timezone-neutral timestamping and/or
other devices to "age" the replication records uniquely.

Back in the delightful days of 2000, Dalton Calford posted a series of long
articles to firebird-architect, describing the replication setup they had
at a large Canadian telcom for 24/7 operation. Amongst other things,
Dalton provides some "Really Useful" insights into the issues around
replication. These articles were rolled into a set of knowledgebase
articles in the Really Useful section, under the title "Some Solutions to
Old Problems". You can pick them up at
http://firebird.sourceforge.net/index.php?op=useful&id=calford_1

(though Sourceforge seems to be down at the moment...)

./heLen