Subject Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author Jim Starkey
Dmitry Yemanov wrote:

>>Borland
>>dropped it in favor of their write ahead log, which was itself dropped.
>>
>>
>
>Borland's WAL never worked in public, IIRC.
>
>
I idea of journalling is that a transaction isn't considered committed
until the bits reside on two separate pieces of oxide. The Borland guys
forgot about this and implemented a combined transaction / after image
journal thinking they could get the combined benefits of the safety of
journalling and performance of a single write commit. What they had
done, in fact, was to insert a single point of failure from which no
recovery was possible. They were actually demonstrating the code at a
users meeting when they finally understood what Ann and I had been
telling them to six months. By that point they had destroyed the
journalling code and the WAL didn't look like much fun anymore.

>
>
>>It would be easy to reimplement, however. But based on experience, I
>>don't think anyone would actually use it.
>>
>>
>
>As I have enough experience using such a recovery with our Oracle and MSSQL
>customers, I disagree.
>
>
>
>
The engine side is fairly simple. A second buffer is allocated as part
of mark. Page changes are written to the second buffer (if the list of
changes exceeds the length of the page, the second buffer is released
and the page itself is journalled). When journalling is enabled, the
database writes itself to the journal using the same scheme as creating
shadows. When a page is to be written, either the change records or the
page itself is written to the journal.

In classic, there is no alternative to a journal server. Operationally,
however, a journal server is probably the best way to go anyway, so all
databases get journalled to same subsystem.

The problem with journals is they eventually fill up and require manual
intervention. Maybe with contemporary infinitely large disks this isn't
true.

Personally, I think journalling is a dumb way to handle disaster
recovery. So much can go wrong and recovery takes so long as to be
almost unthinkable. I believe that a disaster recovery strategy based
replication to hot backup copies is the only way to go. Firebird needs
two things: An online physical copy facility to initiate a replicant and
an integrated replication scheme.

I'm not going to complain if somebody wants to write a subsystem to roll
back history to a give point, but I have trouble imagining anyone ever
using it. Let's see -- we've got a recovery scheme that requires that
you take your web server off the air for a couple of hours while we
apply an undo log. Right. (Meaning "wrong", "won't fly", and "you've
got to be kidding"). I may have some appeal to client server types who
haven't learned about the web, however. But they didn't like it last
time, so I doubt they'll like it in future any better. But not my call.

I will agree that Firebird doesn't have an acceptable disaster recover
strategy. That is the problem to be addressed.