Subject Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability
Author Roman Rokytskyy
Paul,

>> The information required for the two mechanisms is radically different.
>> A commit sequences needs to know the full read and write sets, so
>> if a transaction counted a million records, the commit sequencer needs
>> to know not only which million records but which records weren't read
>> (e.g. hadn't arrived at that node yet).
>
> Have you actually read and pondered the paper?? A one million record
> update would first occur locally, a single, short commit request would be
> sent out and when it is received back again a local commit is attempted.

I'm not sure that this description is 100% correct.

In my understanding (though I admit that I did not finish reading that
particular paper), the write set is not "persisted" before commit. In
other words, no single local write lock is acquired before commit. Only
on commit all update statements are collected together and send over the
total ordered messaging system to all nodes where they are applied locally.

Otherwise you could have a situation that T1 updates record on node A,
T2 updates the same record on node B, then both decide to commit, and
messages sent on commit are ordered so that write set from T2 comes
before write set of T1. In this case local commit of T1 on node A will
succeed, but will fail on node B and vice versa.

So, if this thinking is correct, there must exist some "buffer" between
the actual database pages and rest of the engine that temporary
"simulates" writes within the lifetime of a particular local transaction
and then sends the write set on commit over the CGS. This buffer must
also support reads, since nobody prohibits me to read just modified
record within my local transaction.

Regarding the serialization. The GCS has not that many communication
primitives - total order, casual order, fifo order. I am not sure that
casual order is enough to satisfy the serializability requirement
defined by Ann.

Roman