Subject | Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability |
---|---|
Author | Dalton Calford |
Post date | 2010-01-30T16:59Z |
I have been watching this thread, and I think the big difficulty happens to
be with the concept that data changes. Data has a time of relevance, but
does not change.
With limited disk space, you consider that any older values have no
importance, thus they are removed from the database. In accounting, no data
is ever irrelevant, thus it is never removed. You may archive the data,
making it harder to query, but, you do not destroy it. You keep track of
every action upon a tuple by a journal. That journal includes all changes,
in the order they are recorded, from where that change occurred and who
performed the change.
Now, the journals are isolated into batches, and when a batch is closed, all
batches in the same period are then accumulated and the period becomes 'In
transition'. The period then is reviewed and 'closed', becoming
historical. Each node of the cloud would be sent that information which
is considered relevant to the needs of the local entities.
Now, this structure was developed long before computers or sql were ever
developed. It was developed for the use of corporations spanning many
countries with multiple conflicting data needs while being audited by
different countries with different taxation and recording rules.
The mechanics of the process are being lost as manual bookkeeping is being
forgotten, but the problems that where developed to deal with the problems
of conflicting updates, where all dealt with long ago.
Now we are once again having data that is migrating from a single set of
books to a large distributed cloud of data. I do not see why we can not
step back and use those data rules for all databases, accounting or
otherwise.
What it means is that updates would never destroy the older data - and
querys would require specifying the period of interest.
In setting up such a system, certain assumptions would need to be made - a
super-transaction if you will, that go beyond the current transaction or
connection but include the current data connection point, time period and
data importance/value.
I hope I am being clear in the concept, as this deserves far more thought
than what a simple email can provide. Perhaps it will just get people
thinking outside of the box and realize that many of the problems that are
currently being worked on, where solved along time ago.
be with the concept that data changes. Data has a time of relevance, but
does not change.
With limited disk space, you consider that any older values have no
importance, thus they are removed from the database. In accounting, no data
is ever irrelevant, thus it is never removed. You may archive the data,
making it harder to query, but, you do not destroy it. You keep track of
every action upon a tuple by a journal. That journal includes all changes,
in the order they are recorded, from where that change occurred and who
performed the change.
Now, the journals are isolated into batches, and when a batch is closed, all
batches in the same period are then accumulated and the period becomes 'In
transition'. The period then is reviewed and 'closed', becoming
historical. Each node of the cloud would be sent that information which
is considered relevant to the needs of the local entities.
Now, this structure was developed long before computers or sql were ever
developed. It was developed for the use of corporations spanning many
countries with multiple conflicting data needs while being audited by
different countries with different taxation and recording rules.
The mechanics of the process are being lost as manual bookkeeping is being
forgotten, but the problems that where developed to deal with the problems
of conflicting updates, where all dealt with long ago.
Now we are once again having data that is migrating from a single set of
books to a large distributed cloud of data. I do not see why we can not
step back and use those data rules for all databases, accounting or
otherwise.
What it means is that updates would never destroy the older data - and
querys would require specifying the period of interest.
In setting up such a system, certain assumptions would need to be made - a
super-transaction if you will, that go beyond the current transaction or
connection but include the current data connection point, time period and
data importance/value.
I hope I am being clear in the concept, as this deserves far more thought
than what a simple email can provide. Perhaps it will just get people
thinking outside of the box and realize that many of the problems that are
currently being worked on, where solved along time ago.
On 30 January 2010 10:01, Milan Babuskov <milanb@...> wrote:
>
>
> Jim Starkey wrote:
> >> E.g. two concurrent transactions, Tx1 registers a deposit of $10 in my
> >> account, Tx2 a deposit of $20. Tx1 executes on Node A, Tx2 on node B.
> Tx1
> >> commits first. When Tx2 commits, it updates its own record (it is
> >> oblivious
> >> to the other update under rule 2), the database remains consistent so
> the
> >> commit proceeds. The Tx1 deposit is lost.
> >>
> > No, that's covered by the rule that a transaction can't update a version
> > of a record that it didn't / couldn't see. In other words, a classical
> > Firebird update conflict.
> >
> > The mechanism is that each record has a deterministic resolution agent.
>
> Hello,
>
> I apologize in advance if what I'm about to write has already been
> "invented" and dismissed before...
>
> Looking at the discussion you have, it strikes me that people trying to
> build distributed systems are still in the box of the previous systems
> they built and are not able to look at things outside of that box.
> (sorry if my English is a little bit crude).
>
> The whole idea about each transaction reading the value A and storing
> the value B seems very wrong to me in disconnected, "cloud" environment.
> It calls for synchronization and conflict resolving as long as you have
> it. The problem is how to solve the conflict that propagates through the
> network on nodes and happens 50 nodes away from points there it was
> really created.
>
> It would be much better if transactions don't store the "old state" and
> "new state" but rather just "delta" state. Now, "delta" is not a simple
> value, because there are many ways to interpret why something had a
> value of "10" before and has "20" now. This calls for abandoning SQL as
> inadequate tool and replacing it with something that would do the job
> properly.
>
> This new language could keep the relational concept for data, but should
> be able to express deltas as functions. A simple example would be
> "increaseBy" operator. Instead of x = x + y, one would use
> increaseBy(x,y) in such language. Instead of storing "y" in new record
> version, one would store "increaseBy(x,y)". Of course, except for basic
> math, all functions would be user-defined at application level. This
> means that developers writing applications for such model would need to
> express each data alteration as a function showing "what" is done, not
> "how".
>
> Potential problem with this approach is that developer would need to
> know beforehand what kind of "function" might be possible for certain
> table columns. This is why I believe those functions should be defined
> in advance, as part of database metadata - maintained with some kind of
> DDL statements. I haven't thought about this throughly, but functions
> that express conflict resolution (i.e. what do to when you have to apply
> multiply() and increase() coming from two different nodes at the same
> time) could also be stored and applied automatically if they are present.
>
> I hope any of this makes some sense to you.
>
> Regards,
>
> --
> Milan Babuskov
>
> ==================================
> The easiest way to import XML, CSV
> and textual files into Firebird:
> http://www.guacosoft.com/xmlwizard
> ==================================
>
>
>
[Non-text portions of this message have been removed]