firebird-architect - Re: Remote Shadows (again...)

Subject	Re: Remote Shadows (again...)
Author	Roman Rokytskyy
Post date	2006-10-04T22:01:56Z

Jim,

> Sure, you could use UDP. But by the time you've build guaranteed
> sequential delivery, you've reproduced TPC.
>
> I guess that multi-cast could be used to talk to multiple shadow
> servers, but making that reliable would be very, very hard.

We wanted to avoid multiple TCP connections to the same server. Didn't
we? Using UDP requires one open socket pro node (CS instance or shadow
server). The reliability is hidden in the stack itself (sequencing
numbers for messages + buffering if a hole is detected).

The thing isn't easy at all. The only hope is to find already existing
library, otherwise it does not make sense to start it coding it.

> >
> >> On the other hand, if the shadow server is going
> >> shadow more than one database, maybe that's unavoidable. Second, to
> >> guarantee database integrity, the protocol would have to sequence
> >> messages.
> >>
> >
> > Automatically guaranteed by the protocol stack - you specify only the
> > required properties (FIFO, Total Ordering).
> >
> Sorry, it isn't. If two processes on machine do writes on TCP sockets
> to a single remote process, there is nothing that guarantees in what
> order they will be delivered, even with an explicit flush. Unless you
> have some magic up your sleeve, the only way to synchronize two classic
> processes talking to a single shadow server is for one to wait for an
> acknowledgment before releasing the lock on the page. On a single
> socket, TCP guarantees sequence delivery. On two or more, you pays

your

> money and takes your chances...

Why would I need to wait until the ACK comes from shadow? Assuming
that our protocol stack guarantees the total order of messages and
reliable delivery, I, theoretically, should be able to send a message
with the page update and forget about it. Considering the careful
writes the shadow will be always consistent, but may (I admit it)
represent state of the database e.g. two minutes ago. It is quite
acceptable for many applications

Where is the failure in my logic?

> > Ok, here it is the same - joining the group is expensive. The only
> > possible solution - join it and stay connected.
> >
> That, by definition, doesn't work for classic.

I was considering xinetd approach - let them stay in memory ready for
reuse.

> But there are lots of alternatives that could be investigated. For
> example, a single forwarding process connected to classic by IPC.
> Probably OK for classic but utterly unnecessary for superserver.
> Again, my point: In a combo code base, the existence of class slows
> down superserver.

Yes, it sounds correct. However I cannot agree with the proposal to
drop classic at all - it should be still ready to build shared disk
clusters (with cheap NAS attached per NFS). Sorry, if I misread your
proposal.

> Are you talking about the forwarding process described above or
> suggesting that the shadow server could be on the same the machine?

Yes.

> The latter, in my usual humble opinion, is no good for server
> failover, which is how we got into the discussion in the first
> place.

Agree. But if we consider SS, the same issue applies - if my primary
crashes, my database is gone. The shadow in this case will contain an
"incomplete" copy of the original database (considering the buffer in
that middleware process) or it will contain exact copy if each classic
instance waits for ACK from shadow server.

> For what it's worth, Interbase was born with everything in the
> engine to support a shadow server -- everything. Rather than a
> shadow server, however, the engine talked to a journal server. The
> basic scheme was that on first page update the cache manager
> allocated a second buffer to hold deltas. When the page was
> written, the deltas (or the page itself is the accumulated length of
> delta overflowed the secondary buffer) the update was blasted to the
> journal server (gltj for your Interbase historians. If the code
> were still there, all that would be necessary would be to replace
> (or augment) gltj to maintain a shadow rather than a sequence
> journal file.

If I understand correctly, at that times Interbase was using the
classic architecture. So, it is still possible (given unlimited
resources and time).

Again, I see merits of CS in being able to run on different hosts and
work with the same database file attached per NFS (requires DLM
though). I have no problem saying that the remote shadow feature is
only available in SS and clustering only in CS but I want these two
features.

Roman