firebird-architect - Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability

Subject	Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability
Author	Jim Starkey
Post date	2010-01-30T21:15:09Z

Milan Babuskov wrote:

> Jim Starkey wrote:
>
>>> E.g. two concurrent transactions, Tx1 registers a deposit of $10 in my
>>> account, Tx2 a deposit of $20. Tx1 executes on Node A, Tx2 on node B. Tx1
>>> commits first. When Tx2 commits, it updates its own record (it is
>>> oblivious
>>> to the other update under rule 2), the database remains consistent so the
>>> commit proceeds. The Tx1 deposit is lost.
>>>
>>>
>> No, that's covered by the rule that a transaction can't update a version
>> of a record that it didn't / couldn't see. In other words, a classical
>> Firebird update conflict.
>>
>> The mechanism is that each record has a deterministic resolution agent.
>>
>
> Hello,
>
> I apologize in advance if what I'm about to write has already been
> "invented" and dismissed before...
>
> Looking at the discussion you have, it strikes me that people trying to
> build distributed systems are still in the box of the previous systems
> they built and are not able to look at things outside of that box.
> (sorry if my English is a little bit crude).
>
> The whole idea about each transaction reading the value A and storing
> the value B seems very wrong to me in disconnected, "cloud" environment.
> It calls for synchronization and conflict resolving as long as you have
> it. The problem is how to solve the conflict that propagates through the
> network on nodes and happens 50 nodes away from points there it was
> really created.
>
> It would be much better if transactions don't store the "old state" and
> "new state" but rather just "delta" state. Now, "delta" is not a simple
> value, because there are many ways to interpret why something had a
> value of "10" before and has "20" now. This calls for abandoning SQL as
> inadequate tool and replacing it with something that would do the job
> properly.
>

Um, doesn't this introduce the problem that each of, say, 1000 ATM
machines could each withdraw the last $10 in a bank account?

This problem doesn't have anything to do with SQL or relational
databases, but the nature of data. It's ok to let any number of ATMs to
try to withdraw the last $10, but only one should succeed.

Consistency requires communication. Conversely, without communication,
there can't be consistency. (An explicit rule in an ATM balance systems
is that accounts can't be overdrawn. This does need to be enforced, and
it can't be enforced without communication.

You are essentially arguing for the BigTable / SimpleDB model where a
transaction is restricted to a single row update and where it is
possible to reconcile conflicting transactions after the fact. This, in
fact, works well for social networking because updates are essentially
expendable, and the cost of inconsistency is negligible. Facebook, for
example, really doesn't like losing updates, but is willing to accept
the consequences (e.g. none) if that happens.

ACID is required when data is valuable, volatile, and shared. Remove
any of these requirements, and the BigTable / SimpleDB model is just
fine. But it doesn't work for banks.

> This new language could keep the relational concept for data, but should
> be able to express deltas as functions. A simple example would be
> "increaseBy" operator. Instead of x = x + y, one would use
> increaseBy(x,y) in such language. Instead of storing "y" in new record
> version, one would store "increaseBy(x,y)". Of course, except for basic
> math, all functions would be user-defined at application level. This
> means that developers writing applications for such model would need to
> express each data alteration as a function showing "what" is done, not
> "how".
>
> Potential problem with this approach is that developer would need to
> know beforehand what kind of "function" might be possible for certain
> table columns. This is why I believe those functions should be defined
> in advance, as part of database metadata - maintained with some kind of
> DDL statements. I haven't thought about this throughly, but functions
> that express conflict resolution (i.e. what do to when you have to apply
> multiply() and increase() coming from two different nodes at the same
> time) could also be stored and applied automatically if they are present.
>
> I hope any of this makes some sense to you.
>
> Regards,
>
>

--
Jim Starkey
NimbusDB, Inc.
978 526-1376

[Non-text portions of this message have been removed]