Subject Re: Short or long transactions
Author Adam
> How do you determine what constitutes "unrelated"?

Surprisingly easily.

If you were to let clumsy Adam near your server to trip over the power
cord between the UPS and the server at the worst possible moment in
time, would your database be in a consistent state?

Lets take a simple spreadsheet example (my apologies if Yahoo eats the
formatting),

| A | B | C
--+-----+-----+-----
1 | =1 | =2 | =A1*B1

(looking at values now),

| A | B | C
--+-----+-----+-----
1 | 1 | 2 | 2

Suppose you changed A1 to =4. The result would be

| A | B | C
--+-----+-----+-----
1 | 4 | 2 | 8

Lets presume this required you to run two simple update statements

update spreadsheet set value = 4 where row=1 and col='A';
update spreadsheet set value = 8 where row=1 and col='C';

(Again, not really the best design, but this is a simplified example).

Now lets presume you run these in different transactions.

1. Start Transaction
2. update spreadsheet set value = 4 where row=1 and col='A';
3. Commit;
4. Start Transaction
5. update spreadsheet set value = 8 where row=1 and col='C';
6. Commit;

This is a problem, because if I trip over the power cord anywhere
after 3 but before 6, you will be left with a spreadsheet in this state.

| A | B | C
--+-----+-----+-----
1 | 4 | 2 | 2

That is what we mean by a unit of work. You don't commit inside a unit
of work, because a unit of work must entirely complete or not occur at
all (atomicity) and must leave the database in a consistent state.
Consistency is interpreted by your business rules. There is nothing
inherently good or bad about A1 being 4 at the same time C1 is 2,
outside your business rules that state that the result in C1 must be
consistent with the results of a particular formula.

If you instead used the following:

1. Start Transaction
2. update spreadsheet set value = 4 where row=1 and col='A';
3. update spreadsheet set value = 8 where row=1 and col='C';
4. Commit;

It doesn't really matter what time the server lost power, it is always
left in an acceptable state according to the business rules.


> > For inserts obviously, updates it would depend on the amount changed.
>
> This will affect the amount of work performed to actually store the
> data, but (I suspect) will have no impact on the question of
> transaction overhead. Is that incorrect?

If forced writes are enabled, then it may impact the size of the data
still to be flushed, but I have never benchmarked anything similar.

> >
> > The
> > > number of rows in the tables?
> >
> > Shouldn't
>
> Are you saying that the number of rows has no affect on transaction
> overhead? That's plausible, but again, I'm not concerned with the
> amount of work required to actually store data in the DB.

Firebird (unlike some DBMS) stores updates to rows directly in the
database files at the time you issue the command. The record version
has your transactions id against it. The TIP page is like a bitmap of
all transactions and their state (open, committed, rolled back,
limbo), so that when other transactions encounter that record they
know whether they are meant to ignore it or not.

Starting a transaction involves updating a counter. Committing a
transaction involves updating some bits on the TIP page. When forced
writes are used in conjunction, it will obviously force the writes to
be flushed (at least for your transaction, I am unsure of the exact
implementation details).

> > > The mix of update/insert/delete?
> >
> > updates and deletes create record versions which must be garbage
> > collected at a later time, so only update or delete records that you
> > want to (there was a post a few days back where someone was deleting
> > records only to reinsert them).
>
> I don't think that's an issue here. But I do have some control over
> the ordering, if it's going to impact transaction overhead.
>
>
> >
> > > Cache size?
> >
> > Yes.
>
> How will cache size affect transaction overhead? Will more or less
> cache reduce transaction overhead, and how will that affect the actual
> work performed within the transaction?
>

I mean a larger cache will reduce the expense of certain operations
within your transactions. It is not going to help or hinder the action
of starting / ending a specific transaction.

>
> >
> > >Forced writes?
> >
> > Yes, but if you care about your data this is non negotiable.
>
> Let's say I don't care about my data. How will forced writes affect
> transaction overhead?

It will allow the database to flush changes to disk when it is
convenient rather than straight away. Not using this I imagine would
speed up commits, but I am not overly familiar with the inner workings
of asynchronous writes because they are inappropriate to our
environment (we do care about our data). If you do disable forced
writes, then you need to know that some older versions on some
platforms had some pretty serious bugs that could result in you losing
quite a lot of data.

> >
> > > Indices?
> >
> > Absolutely. Massive benefit to some selects, slight penalty for
> > everything else so design with respect to how you anticipate the data
> > to be used.
>
> There are surprisingly few selects, but I'm guessing that a row update
> is going to use the same internal mechanism that selects a row, so
> indices will play a role. But the question is, will the indices
> affect transaction setup and teardown?
>

No they wont come into play here, I thought you were asking about
whether your transactions would run faster.

Adam