firebird-support - Re: [firebird-support] DDL not "entirely" under transactional control?

Subject	Re: [firebird-support] DDL not "entirely" under transactional control?
Author	Helen Borrie
Post date	2006-05-20T04:40:20Z

At 12:52 PM 20/05/2006, you wrote:

>Hi all,
>
>As suggested by Helen I've tracked down what I meant in a prior thread
>by DDL statements being not "entirely" under transactional control. I'm
>not sure whether this is a bug or feature of Firebird; given that I
>mostly work with DB2 which uses a traditional locking mechanism for
>transactional control I could be misinterpreting an aspect of the
>multi-generational engine used by Firebird. Nevertheless, I'd be
>interested to find out the reason for the following behavior.
>
>Consider the following (admittedly contrived) situation:
>
>Given two separate, simultaneous transactions, the first transaction
>creates a table, inserts some data into it and then commits. After the
>table has been created, but before the first transaction commits, a
>second transaction attempts to read from the table.
>
>In other words, the sequence of events looks something like this:
>
>Step TX 1 TX 2
>==== ================ ================
>1 <start>
>2 CREATE TABLE X
>3 INSERT INTO X
>4 <start>
>5 SELECT * FROM X
>6 COMMIT
>
>Under DB2, which as mentioned uses a locking model, the following occurs
>when the above sequence is carried out in two simultaneous sessions (for
>anyone wishing to try this, remember to use -c- or +c to turn off
>auto-COMMIT in the command line processor):
>
>Step TX 1 TX 2
>==== ================ ================
>1 <start>
>2 CREATE TABLE X
>3 INSERT INTO X
>4 <start>
>5 SELECT * FROM X
> <blocks>
>6 COMMIT
> <retrieves data>
>
>In other words, after transaction 1 has created the table, but before it
>has committed, any other transactions attempting to access the table
>will block until transaction 1 commits or rolls back (at which point the
>other transactions would either succeed in accessing the table or fail
>with "table does not exist" respectively).
>
>Under Firebird, using a multi-generational model, the following occurs
>when the sequence is carried out in two simultaneous sessions (anyone
>wishing to try this, remember to use SET AUTODDL to disable auto-COMMIT
>of DDL statements in the isql tool):
>
>Step TX 1 TX 2
>==== ================ ================
>1 <start>
>2 CREATE TABLE X
>3 INSERT INTO X
> <fails>
>
>Okay, so it looks like a table can't be used until the CREATE TABLE
>statement that created it is committed. This was the first thing that
>led me to wonder whether DDL statements were (or weren't) "entirely"
>under transactional control.
>
>That said, my gut instinct is that I'm not understanding some aspect of
>multi-generational architectures which mandates such behavior?
>
>Anyway, here's a second attempt, this time adding a COMMIT immediately
>after the second transaction starts to allow the INSERT in the first
>transaction to succeed:
>
>Step TX 1 TX 2
>==== ================ ================
>1 <start>
>2 CREATE TABLE X
>3 <start>
>4 COMMIT
>5 INSERT INTO X
>6 SELECT * FROM X
> <returns empty result set>
>7 COMMIT
>
>At first glance this looked a lot more like what I'd expected. However,
>after thinking about it, there was still something odd about this too:
>given that transaction 2 started before the CREATE TABLE statement was
>committed, shouldn't it be complaining that table X doesn't exist
>instead of returning an empty result set?
>
>To put it another way, assume the SELECT statement in the second
>transaction is moved after the second COMMIT in the first transaction:
>
>Step TX 1 TX 2
>==== ================ ================
>1 <start>
>2 CREATE TABLE X
>3 <start>
>4 COMMIT
>5 INSERT INTO X
>6 COMMIT
>7 SELECT * FROM X
> <returns empty result set>
>
>In this case, the SELECT statement in the second transaction can't see
>the inserted (and committed) records from the first transaction, yet it
>can see the table created (and committed) by the first transaction.
>
>Again, I expect I'm not understanding some aspect of the MGA. Please
>don't take any of the above as criticism of the MGA or of Firebird; the
>behavior demonstrated above whether bug, feature, or entirely normal
>hasn't caused me any real headaches - it's just something I happened to
>run across and was rather confused by. If someone can help me understand
>it, so much the better.
>
>For reference, I tested the above on Firebird embedded 1.5.3 on Windows,
>and Firebird SS 1.5.3 on a Gentoo Linux box (oh, and DB2 UDB 8, FP11 on
>the same Gentoo box).

I think there are quite a few things you don't understand. You
really need to abandon all those precepts you're bringing to your
Firebird experience from "databases" that are really just physical
tables will bells and whistles. :-) In Firebird, you are not dealing
with physical tables at all but with lumps of data that are plonked
onto a chunk of disk and catalogued as "a row in a structure that you
think of as a table".

When you "create a table", what you are actually doing is providing a
mass of information about a row structure that you want to think of
as a table. (Actually, the database engine calls this definition a
"relation". The SQL standard calls it a "table".)

The DDL engine sifts through this mass of information and creates
rows in at least two relations in its own private database. First,
it creates a row for a relation call RDB$RELATIONS. The important
thing it stores there is the name of the new relation
(RDB$RELATION_NAME). It will use this as a key for relationships
between this relation name and "things" in other tables that, well,
RELATE to it.

In another table, named RDB$RELATION_FIELDS, stores one row for each
column in your table definition. One of the important things in this
table is RDB$FIELD_NAME, since it will be a key to things that relate
to that field.

Now, these metadata relations start to get complicated as various
attributes get spread around more relations, but I hope there's
enough here to show that there is DML going on behind any DDL
statement you post. Now, until you COMMIT that DDL statement, your
transaction can "see" the DDL that it did; but the database is not
yet in a state where it can safely process any DML requests for the
so far partly created object, because the transaction has not yet
been committed.

The isql tool - which is not part of the engine, but is a client
application - is written to "do the right thing" with DDL
statements. That is, its default behaviour is to autocommit DDL
statements unless you tell it otherwise, using SET AUTODDL OFF. It
literally starts a new transaction when you pass a DDL request and
then commits it immediately upon execution.

The whole idea there is to make sure that, if you don't really
understand how things work under the blankets, at least isql will try
to stop you from doing something silly (like trying to poke data into
a table that isn't ready to store data........right?)

Now, isql also lets you pass DML requests. When autoddl is on, all
DML requests run in a *separate* transaction. They will be executed
as and when they are encountered but they will never be committed (or
rolled back) unless you explicitly send a COMMIT request (or a
ROLLBACK request). Then, at that point, they will ALL be committed
(or rolled back) or, if there is an exception, nothing is committed.

If you set autoddl off, then both your DDL and DML requests will be
in the same transaction. So the fireworks can really start happening
if you start throwing inserts and dependencies at an uncommitted
database object.

There are lots of utility programs around that allow you to submit
DDL requests. The better ones work hard to prevent you from shooting
yourself in the foot, or at least ask for double-triple confirmation
that you really want to. There are a few that actively encourage you
to do really daft things.

Now, let's move on to your apparent misconceptions about transactions.
1. A transaction can see everything it has done since it
started. No transaction can see the uncommitted work of any other
transaction.

So --- when your first transaction creates the table, it "sees" the
row that has been posted to RDB$RELATIONS and also (I think, at this
stage) also fragments of rows that have been posted to
RDB$RELATION_FIELDS. Unfortunately, it can't see quite a lot of
other bits and bobs that get posted into other system tables and into
catalogues when DDL gets committed. All bets are off as to what
might turn up as data in the table when this heterogeneous
transaction is eventually committed.

You see me using the term "post" here, separately from
"commit". "Post" is a kind of vernacular way of referring to the
creation of a new record version. New record versions exist in the
database, visible to the transaction that created them, but they
don't change database state and may never do so. (Rolling back a
transaction invalidates any and all new record versions it created...).

So, unlike your file-based database tables, a Firebird "table" at
creation is just a collection of attributes in a number of system
tables. It capability to accept data comes *after* the transaction
containing the user's DDL request has been committed. That
capability is governed by things--not all of which exist at "post"
time--that the engine refers to in order to create the structures
that we think of as "rows". A row in a Firebird table is physically
nothing like a record in an xBase table.

In short, there's enough here to tell you that, when documentation
tells you not to mix DDL and DML statements in the same transaction,
you should listen.

The other thing you appear to misunderstand is "blocking". A SELECT
doesn't block anything - not other SELECTs nor (in normal isolations)
DML. The catchphrase is "Readers don't block writers or other
readers [unless you intend them to]." In your scenario, once the DDL
for the table is committed, any transaction can start throwing data
at it. Whether a particular request will succeed, or will fail with
some form of conflict, will depend on the parameters of the
requesting transactions.

So, while it is *possible* to set up a "blocking" scenario, with
aggressive transaction settings, you normally wouldn't do
so. Throughput is usually smoother if, rather than interfering with
the optimistic locking, you go with it and do a very effective job of
intercepting and handling exceptions. That is, after all, what all
those gazillions of exceptions are meant for.

(In fact, there's one peculiar set of conditions where a reader can
block a writer, absent any intentional blocking, totally surprising
the developer. The reader does a select...where x = 'y'. Then the
writer, in a NO WAIT transaction, tries to do an insert to the
table. That will except immediately. If the writer's transaction is
in WAIT, then the request won't except, it will wait until the
reader's transaction commits. It's up to the application developer
to decide how to play this, but play s/he must.)

./heLen