Subject Windows Stability - or lack off ...
Author Lester Caine
Having lost a number of sites on the railways due to political decisions
rather then technical ones, even when we provide a lower cost system, I
have been asked to help out and explain problems that have brought down
an installation from a competitor. I can't go into too much detail, but
I'm looking for documentation on how a crash that took out the system
could have been avoided, and how using Firebird would have prevented the
situation.

The system had been installed with a 4Gb partition for the W2003 system
partition on the server main disk, with the swap file on a second
partition. ( and an alternatiive database on a third partition ;) ) The
machine also only had 1Gb of RAM. This should have given cause for
concern from day one - but that is history. 'Problems' with the system
caused the database log file on the system partition to grow, but this
file was supposed to be designed to leave 100Mb of free space on the
disk (partition). Needless to say, the partition filled up leaving W2003
no space to work in and it crashed trying to recover from being out of
space. My own understanding is that windows has never been stable when
it runs out of disk space, even trying to write entries to the event log
after it knows the disk has no space to hold them?

The obvious thing here is that the disk partition should never have been
allowed to run out of space, and 100Mb of leeway was only applied to the
secondary logging, with nothing monitoring that WINDOWS had enough space
left to actually work. The 1Gb of RAM was too small and resulted in a
lot of activity to the swap file - which just being another partition on
the same disk just added to the problems. While the 'non-firebird'
database was on another partition this also suffered corruptions, and
the backup server was basically in the same state as the main server so
the whole site went tits up!

I know that I would not have allowed the system to get into the state it
was, and that any backup machine would be managed differently, but now I
need some evidence to show that what happened was avoidable, and that
the steps we would have taken would have prevented the total system fall
over. I can remember reading some white papers on key elements on this -
comparisons with how Linux behaves and recommendations for windows
server setup, but I can't put my hands on anything useful at the moment.
The links to MS no longer give me valid pages so can anybody give me
some 'amunition' to back up my presentation on this?

--
Lester Caine - G8HFL
-----------------------------
L.S.Caine Electronic Services - http://home.lsces.co.uk
Model Engineers Digital Workshop -
http://home.lsces.co.uk/ModelEngineersDigitalWorkshop/
Treasurer - Firebird Foundation Inc. - http://www.firebirdsql.org/index.php