firebird-architect - Re: Remote Shadows (again...)

Subject	Re: Remote Shadows (again...)
Author	m_theologos
Post date	2006-10-05T08:46:50Z

**** For Alex Peskhov (specially) (but not only). I remembered
something important for our discussion in the time of writting this
message! Details bellow!

--- In Firebird-Architect@yahoogroups.com, Alexandre Benson Smith
<iblist@...> wrote:

>
> Jim Starkey wrote:
> >
> >
> > I haven't a clue as to how to do it in classic. The superserver

version

> > would probably take less than a week to implement included the

server.

> >
> > --
> >
> > Jim Starkey
> > Netfrastructure, Inc.
> > 978 526-1376
> >
> >
>
>
> Hi !
>
> Sorry if this is just stupid, but in classic there is a point where

the

> page is written to disk, two process can't write a single page ate

the

> same time right ?
>

Wrong, IMHO. There isn't a point in classic where two processes
writes at the _same_ time. In the way in which works classic with
'c:\MainDatabase.FDB' and 'c:\MyShadow.sdw' (through the lock
manager, CIIF), in the same way (according with the documentation of
Win32 SDK) will work also with '\\SERVER-A\MyShare\MyShadow.sdw')

======== ********* Ahhhhhh!!!.... Now I remembered!! Alex! (Peskhov)
Pay attention!:

It works! (Almost) for sure! (sorry for yelling...)

From where do I know?

Reason #1
In the beginning of the year we did an important restructuration of
our network and in the same time I did also some small improvements
of our applications (which had then Firebird 1.5 as backend). Among
other important things we put two new servers: SERVER-A with FB SS on
it and SERVER-B, as backup server.
Suddenly, a strange behaiviour appeared: The application(s) worked
some ammount of time _whitout_problems_ and after a while the users
which tried to open the same application, after a significant ammount
of time, much longer than in normal cases, in which they wait for the
application to open, finally the application(s) begun to throw very
ugly error messages and only Ctrl+Alt+Del + kill task brought the
things on normal. And this wasn't after the same amount of time or on
the same workstations... And this issue keep us digging one or two
months until I found (finally!) what was wrong.

The clue?...

Due to upgrading issues (migrating user accounts, shares aso.) the
SERVER-A had a new IP address: 192.168.0.110. All our programs
pointed on the old server ie. the connection string was, for example,
192.168.0.101:\FDB\ERP.FDB. I, wise as allways, I thought that
hardcoding the IP address in the program is a bad thing, so I changed
this with \\SERVER-A\FDB\ERP.FDB

Can you see it?...

Because on all our workstations we did a full installation of FB SS
architecture (don't ask why) each workstation when saw the \\SERVER-
A\FDB\ERP.FDB acted like a local server (because the fbclient.dll
resolves this as a file for the local server... hummm...
developers...) on a remote shared file in a windows share, so our
main server was, in fact, unused. Of course, in almost two months of
heavy crash test (~60 workstations reading/writting _simultaneously_
without any lock manager to handle this pressure on databases, 7 fdb
with different structures, sizes and usage profiles) not one bit was
corrupted! No gfix, no mend, no trx in limbo, no need for backup/
restore, no nothing. We worked with the same databases almost 9
months without any performance degradation, until I upgraded them to
ODS 11. Even the workstations (ie. in fact many 'small' servers) was
stalled, applications killed, computers reset aso. Note that we don't
have (at least on some fdb) force writes enabled. So, our crash test
ended succesfully before it begin. But, anyway, if you want to do
some tests specially for the shadows problem if you change the
parsing to allow this, I am at your disposition.

For conformity, the correct connection string should be
SERVER-A:F:\databases\FDB\ERP.FDB

Reason #2
A network plier. A very loved brother of mine knows very well how to
do network cables and, also, has a network plier. When we expanded
the network he helped us very much and we are very greatefull for
this. But his network plier had a small problem. It doesn't press
very thightly the contacts, although the man did his best.
Unfortunately no one of us, even our brother didn't knew this. After
a while, suddenly, the main switches from our backbone started to
crash one after another. Reset switch #1, bring it online again, look
at logs. Nothing. Same procedure for other swithces. Same result. The
FB SS 1.5.3 has the very bad 'feature' that in such situations stalls
and not even stop/start the service doesn't work anymore. Only the
famuous Ctrl+Alt+Del... After many time of searchings we saw that
some LAN ports works at 100Mb and even at 10Mb instead of 1Gb which
is normal. The switches were literally crushed up by the large
ammount of bad packets and retransmit attempts. That's why they gave
up, crash and all the network went down. But again, no problem on
databases, also no performance degradation on fdb apps. So, IMHO, we
can safely implement remote shadows.

HTH,

m. th.

> Why this same piece of code doesn't write to disk and send the page
> content to the "shadow server" over the wire ?
>
> If only one process could write the on a specific data page at a

given

> time, I think it could send it over too.
>
> see you !
>
>
> --
> Alexandre Benson Smith
> Development
> THOR Software e Comercial Ltda
> Santo Andre - Sao Paulo - Brazil
> www.thorsoftware.com.br
>