firebird-support - Re: [firebird-support] DDL not "entirely" under transactional control?

Subject	Re: [firebird-support] DDL not "entirely" under transactional control?
Author	Helen Borrie
Post date	2006-05-21T08:25:06Z

At 02:35 PM 21/05/2006, Dave Hughes wrote:

> > You
> > really need to abandon all those precepts you're bringing to your
> > Firebird experience from "databases" that are really just physical
> > tables will bells and whistles. :-)
>
>Ooh, I'd say DB2's a little more than "physical tables with bells and
>whistles"... Still, I'll try to avoid wandering off-topic to defend my
>"other" favorite database ;-)

Ooops, I'm sure you wrote db2 but I internalised "dbf". (Hence also
the later reference to xBase.

>Actually, I suspect you've misunderstood the intention of my little
>experiment above. To clarify:
>
>What I am attempting to understand is why (in Firebird) a DDL operation
>(such as CREATE TABLE) within a transaction leaves the target of the
>operation "unsafe" until the transaction is concluded, e.g. why a table
>created with a CREATE TABLE statement cannot be accessed until the
>transaction is committed.
>
>This is something I've not encountered before (with other database
>engines), which tripped me up in an application I expanded to talk to
>other databases.

OK, I *did* understand that you were trying to show that. And I was
trying to explain how it comes about that you would bump into one set
of conditions using isql with autoddl ON and another with it OFF.

Now, because it's Sunday, I will meander a bit.

Historically, in the SQL era, IB developers built database metadata
by running DDL scripts in isql. Before IB was an SQL database, it
had its own language, GDML, which had (still has, it's still there) a
lot more capability than SQL. I never learnt it (I was born too
late) but there are still people alive in the world who do. The
original interface tool for GDML was (still is) qli. Over the years
I've seen Ann give examples of things she would do with qli so I'd be
unsurprised if there were some secrets lurking in that playground
that would fascinate.

>Initially, the application was a DB2-specific application, but after
>making the database interface layer a bit more generic I took it for a
>spin with some other database engines, initially Firebird (the embedded
>version on Windows, later a SuperServer instance on a Linux box).
>
>At certain points the application, within a single transaction, creates
>a table and inserts some data into it. Another thread in the application
>attempts to read the inserted data from the table shortly after its
>creation.

Oh dear.

>Obviously the second thread reading data from the table needs to wait
>for the table to come into existence and be populated with data. Under
>the locking architecture of DB2, the application didn't have to be too
>precise about this: provided the second thread began its SELECT
>operation after the first thread had performed the CREATE TABLE
>operation (but not necessarily after the INSERT operation, or the
>transaction committing), the SELECT statement would simply block until
>the transaction creating and populating the table had finished.

Intriguing, it sounds like a workaround to avoid dirty reads.

>I was interested to find out what would happen with an MGA instead of a
>locking architecture. My expectation was that the SELECT would
>immediately fail with something like "table X not found" (not a bad
>thing, just an indication that the application's assumptions would need
>adjusting to cope with a database engine that offered more concurrency
>than one with a traditional locking model).
>
>However I was surprised when, under Firebird, the INSERT in the first
>thread's transaction (rather than the SELECT in the second thread's
>transaction) returned "table X not found" instead.

Happily, too, one hopes. Unless this bit of description is
over-simplified, it's probably worth warning you to explore what
happens when you try to multi-thread a connection....not going there
again this week (see postings from last week).

>My speculation above was that maybe there was something I wasn't
>understanding about multi-generational architectures that mandated that
>DDL operations make their target "unsafe" until the transaction is
>concluded (successfully or otherwise). I now understand my speculation
>is wrong...

Not entirely. A lot of other Dbms's have a completely separate
engine for processing DDL. So it is important to understand that
Firebird doesn't. Under the lid, DDL statements from the surface
become DML statements down in the mine. When you're using isql and
autoddl is on, no worries. Each DDL statement autocommits, all the
internal triggers that construct the bits and bobs are fired and all
is hunky-dory for you to throw DML or more DDL at it. It's no
different to what you are used to. When you remove autoddl from the
picture - as you do if you roll your own metadata factory - you've
got to take the same care to make sure DDL gets committed before
anything tries to refer the object. The DML that the engine has
posted to the system tables, and any that has been performed via
triggers on the system tables, actually exists on disk (as new record
versions for the system tables). So MGA *is* relevant because, with
the DDL and the DML in the same transaction, the transaction can see
all those recversions that have been posted for the DDL.

Once DDL is committed, there is more that the engine does (in
another, *internal* transaction, as I understand, although Ann is the
one who knows all these details...keep watching....). When the
transaction ends, everybody gets to see the table.

>Yes, same as in DB2 (SYSCAT.TABLES, or more precisely SYSIBM.SYSTABLES,
>would be the equivalent of RDB$RELATIONS in this case).

From a storage POV, yes. But not comparable in the implementation
sense, since non-MGA databases automatically "autocommit" DDL, in the
sense that the attributes of a request exist only in a log prior to
the commit. That's to say, if the relation attributes exist in a DB2
system table, they couldn't be anything except committed. With MGA
as it's implemented in Firebird, the initial attributes of the
objects exist in the system tables *before* they are committed (or
rolled back).

>Okay, got it: Firebird's isql is doing the Right Thing (for Firebird):
>ensuring DDL statements are committed ASAP in an attempt to avoid DML
>statements from tinkering with the target of a DDL operation in the same
>transaction.

Yes; and not just user DML. The next statement might be another DDL
statement that requires the engine to kick off further DML on the
same system tables and/or build indices on elements of the table
and/or refer to the table for a FK reference or an operation in a SP
or a trigger in another table or....whatever...

>Ergo, I need to replicate such behavior in the application in question:
>when connected to a Firebird database, DDL statements must be committed
>before any DML statements attempt to manipulate the target of the DDL
>operation.

Yes. Likewise if you are running scripts.

>Ah, but *why* is a table's capability to accept data delayed until after
>the transaction containing the DDL instantiating it is committed?

Is that question answered?

>That's
>not something dictated by the illusory nature of a table in a relational
>database (suffice it to say, I'm well aware that a table is more of a
>concept than a physical entity in such a system).
>
> > In short, there's enough here to tell you that, when documentation
> > tells you not to mix DDL and DML statements in the same transaction,
> > you should listen.
>
>Ah, here's the crux of the matter: somehow I've missed reading (or
>possibly recalling) this important piece of information in Firebird's
>documentation (rest assured I doubt it'll slip my mind in future :-)
>
>I'm going to try and tread carefully here as I sincerely don't mean this
>as criticism of Firebird (which I regard as an impressive database engine):
>
>I had thought that maybe this particular behavior (that "safe" access of
>an object is delayed until the DDL creating it is committed) was
>something peculiar to multi-generational architectures. However, today
>after testing the aforementioned application against PostgreSQL 8 (which
>I believe also uses an MGA?), it performed as I had initially expected
>an MGA would (the SELECT statement in the second thread immediately
>returns "table X not found", and the INSERT statement in the first
>thread succeeds).
>
>Therefore I concluded that this behavior is not peculiar to MGAs, but is
>peculiar to Firebird.

I don't think you were off-track in supposing that it had to with
MGA. But it would be a mistake to assume that all MGAs are created
equal or that all MGA engines behave the same way because they are
MGA. Firebird (i.e. InterBase) was the original MGA and never was
anything else except a pattern of blue and white tiles on a bathroom
wall in Massachusetts. Since "MGA" in other MGA engines was
superimposed over one form or another of the conventional locking
architecture, none of them was designed as an MGA.

>After a bit of Googling and a bit of reading, I think it stems from
>a preference of the developers. From a talk by Ann
>Harrison:
>
>"...Metadata objects cannot be changed in a stored procedure. No DDL
>commands are allowed. Part of the reason for that is a strong prejudice
>on the part of the developers that on-line metadata updates are a good
>idea but can be taken too far. The execution of a DDL command occurs in
>two parts. When the command executes, it updates the system tables and
>creates a description of the physical changes that must be made. That
>description is associated with the transaction. When the transaction
>commits, the DDL changes are instantiated. This is very different from
>some commercial databases where DDL changes are always made through
>stored procedures."
><http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_act_db>
>
>Although Ann's talking about stored procedures here (and their relation
>to view triggers in particular), the sentence talking about the "strong
>prejudice on the part of the developers", and the ones following explain
>(I think?) why Firebird "dislikes" mixing of DDL and DML statements
>within a transaction (please correct me if I've misinterpreted this!)

You've correctly interpreted Ann's (and my) strong views about mixing
user operations with metadata modification. Given that Ann had an
influential role in the process that converted the vision on the
bathroom tiles into the database engine that we're now discussing,
her prejudices on this subject are worth taking in. Not to mention
that the more strongly a relational database engine can preclude the
ability to mess with metadata via user operations, the more strongly
"C" its ACID-compliance will be.

For better or for worse, Fb 1.5 introduced the EXECUTE STATEMENT
syntax into PSQL. Now, while that's really beaut for some neat
little tricks that are hard or impossible to do without, it's a can
o' worms for its ability to submit DDL requests and thus work around
the protection that prevents idiots from writing PSQL
time-bombs. Sigh....they call it progress, don't they?

>It's not a bad thing as far as I'm concerned: I'm inclined to agree with
>the sentiment that "online metadata updates ... can be taken too far",
>and it didn't take more than a few minutes to adjust the application to
>separate DDL and DML actions into separate transactions.

Thus making the world a safer place to live in, to boot. :-) (now, is
that prejudice, or what?)

>I'm definitely looking
>forward to playing more with MGAs - the possibilities in a
>multi-threaded application are very interesting (especially with the
>embedded Firebird engine), and now I know to separate DML and DDL
>statements everything should go a lot smoother.

Especially if, for threading, you are always meticulous about
instantiating a complete "session" for every thread (including the
main one) that is going to perform any database access and ensuring
that a thread never tries to refer to any data object in any other
thread (including the main one). "Session" = database connection[s],
transaction[s] and request[s].

>(Although it's going to be a while before I stop doing double-takes when
>I happen to see the size of the embedded DLL - after all, I'm most
>familiar with an RDBMS that gobbles near half a gig of space :-)

...och, but some folk want butter on both sides of the bread....a
small footprint AND 9,999 inbuilt functions. :-)

./heLen