Subject Re: [firebird-support] Re: Serializing concurrent updates
Author Ann W. Harrison
Hi Timo!

> What is the long answer? :-)

Actually, I looked at your question too quickly and missed the
fact that you've only got one record and you're updating it.
In that case, you don't need a unique index - it won't do any
good with only one record. You'll get update conflicts automatically.
Roll back and try again. When you succeed, you will have stored
the next value.

If you had selected the max value in the table and then stored
a new row with that value plus one, you'd need the unique descending
index.

Here's the reason you don't want to use generators...


> no transactions: V1 is the current value in MyTable
> T1: starts
> T2: starts
> T2: generates a next unique value, V2, updates the value in MyTable
> T2: commits (new transactions see V2 after this, it is ok)
> T1: reads value from MyTable, gets V1 (getting V2 would be incorrect)
> T1: commits
>
> If using a generator, T1 gets V2 if it reads the current value of the
> generator, and this is wrong in my case. I hope you understand what I
> am trying to say.

Well, another short answer is to use generators. Rather than
reading the record and adding one to the stored value, get the
next generator value, update the records with that, and return
the value you got. T1 will continue to see V1.

That will leave holes in the sequence if any transaction rolls
back, which may be a problem. Given that you're updating a
single row rather than inserting new rows, you're going to get
a lot of update conflicts using the generator solution and so
a lot of rolled back transactions. Inserting the new value rather
than updating one existing record will reduce the number of
rolled back records and if you have a descending unique index,
performance should be OK.

If holes in the sequence are a problem, search this list for
the term "auditable" to find a solution - not simple, but a
solution.

A longer answer is that transaction serializability is a hard
problem, especially if you are willing to consider several
different concurrency control methods. The SQL standard levels
of isolation corresponds very closely to the behavior of systems
that use two-phase locking for concurrency control. Two-phase
locking works like this

Don't hold any locks --- read uncommitted
Exclusive locks on records written --- read committed
Shared locks on records read --- repeatable read
"Predicate" locks --- serializable

One of the may cute aspects of the SQL standard is that reads
in "repeatable read" isolation are not repeatable. They suffer
from "phantoms" - meaning that if you look for brown bears
(select species from bears where color = upcase ('brown');
several times in the same transaction, you may get more results
on later queries than earlier, either because somebody stored
and committed a new brown bear, or because somebody changed
some bear from white to brown.

Thus "predicate" locks - i.e. locks on the "brownness" of bears.
In practice predicate locking is normally accomplished by locking
the access paths or by locking position in an index. The result
is that any full table scan locks out writers for the whole table.
And, for example, if you've got a table where some bears are black,
some are brown, and some are white, with an index on color, a
transaction that selects brown bears will block the insertion of
grey bears, green bears, blue bears, and purple bears. Albino
bears and yellow bears can still get in. As you can imagine,
a fully-serializable lock based system is likely to deadlock.

So that's the locking perspective. When you get to serious levels
of isolation, the effect of the locks is to order both reads and
writes. If you remember, I said something earlier about ordering
writes when I was blathering about optimistic concurrency control.
Your particular problem requires that both reads and writes be
ordered.

MVCC (multi-version concurrency control) establishes an order for
writes but not reads.

Read stability is maintained by keeping multiple versions of
records, each tagged with the transaction identification of
the transaction that created it. When a transaction starts,
it decides which other transactions were committed at that
instant, and reads the newest version of each record that was
created by a committed transaction. OK?

Seems pretty easy ... I see the state of the world as it was when
I started. If you and I are running at the same time, we see the
same thing. If we both write things that are ignored by the other,
there's no problem - either of us could go first. If we both
try to update or delete the same row, Firebird recognizes a fatal
condition - one of us is trying to change a row when we can't see
the most recent version. That rule prevents lost writes - where
you make brown bears green and I make brown bears grey, accidentally
eliminating all green bears. As long as we're changing the same
records, our writes are ordered.

What MVCC doesn't prevent are the effects of unordered reads.
Suppose that you change something that I subsequently read. I
read the value before yours, so, thinking about serializing or
order, I should run as if I ran before you. But suppose that I
then change something that you subsequently read. You also read
the version before mine, so in the ultimate ordering, you have
to come before me. Which is not possible. Easy anomalies, like
the one you suggest, can be avoided by adding unique constraints.

But there are harder ones. Some people take the position that
MVCC just doesn't work because there is no way to serialize
transactions. Others use it because the view it provides of
data is actually stable, and the anomalies it produces are
less disruptive than any more stable lock-based concurrency.
Ask me some other time about the hybrid solution that InnoDB,
the MySQL transactional engine, uses. It give different results
for plain SELECT and SELECT ... FOR UPDATE.

>
> I am wondering there should be a more optimal solution in terms
> of performance. Is that really the case that a writer cannot
> increment a value by one and get the incremented value without
> running under the "sophisticated transaction control" of Firebird?

Yes, it's really true. Two writers who start at the same time
will see the same state of the record - say 23. Each will add
one and try to write '24'. There are two ways to avoid that if
you're updating a single record. The easier is to rollback and
retry if you get an update conflict. Or, do your original read
as a SELECT ...FOR UPDATE WITH LOCK.

> Is there a way to avoid using transactions?

No, and transactions aren't your problem. Your problem is that
there's an inherent inconsistency between seeing a non-blocking
stable view of data, and also seeing all the changes going on
in the system.

> As for Firebird transactions, the "no wait" option also allows a
> transaction to immediately fail without waiting for a concurrent update
> to complete. So waiting is not necessary but it is usually the best bet
> if the transaction is about to rerun.

Yes, there's no reason to continue until you're quite sure that the
transaction that blocked you is out of the way.
>


Good luck,

Ann