Subject Re: [IB-Architect] Re : License Question
Author Jim Starkey
At 08:36 PM 3/24/00 -0500, Emil Briggs wrote:
>From: Emil Briggs <emil@...>
>> The Interbase generic lock manager for Unix is based on Sys-V
>> shared memory (ugh) and semaphores (double ugh). However, it
>> still needs to signal processes that are holding a lock blocking
>> somebody in hopes they will down grade it. Unix protection
>> limits to whom a process can signal. If a process finds it
>> is unable to signal somebody, it asks the "gds_lock_mgr",
>> running as root, to send the signal instead. A bit clunky,
>> but the best that can be done on an operating system designed
>> in the 1970s.

>OK. Now things are starting to make sense. I would argue though
>that a process that holds a lock that it doesn't need is a bug
>that needs fixing. Unless there is some other reason for doing it
>that way?

Indeed there is, sir! Database systems regularly revisit certain
pages. Interbase, for examples, uses "pointer pages" to keep track
of data pages assigned to a particular table; a reference to a
particular record goes through to pointer to the data page on which
the record (or at least the head of the record) resides. To actually
construct the record a fragmented tail chain may be traversed, old
version pointer followed, blob references followed (you get the
picture). If the database system had to read every page from the
disk on every reference, it would be slow like pig (like Oracle?).
So Interbase, like every other database in the world (save, maybe
MySQL) keeps a page cache. Interbase, quite reasonably, keeps
a page in cache, locked of course, until either somebody else wants
it (a blocking AST) or Interbase has a better use for the cache
buffer. It is much, much faster to keep stuff in cache and
releasing it on demand than to read it each time.

As you are probably aware, Interbase concurrency control has nothing
(yes, Virginia, absolutely nothing) to do with page locking. Page
locks is used solely for cache coherency and does not follow
two phase locking protocol. Concurrency control is managed by
version version (a different pontification).

And, incidently, the ability to make any page available on demand
required another rigorous requirement on the Interbase implementation:
careful write. Careful write means that individual page writes
are sequenced so the database on disk is ALWAYS valid and consistent.
That why, assuming the operating system performs atomic writes
(a fiction, but somebody else's problem), you can hit a computer
running Interbase with an axe and, assuming you miss the disk,
the database on disk is guarenteed valid, consistent, and ready
to go as soon as the replacement computer arrives.

>My vote (not that we're actually voting) is for pthreads.

Nope. We're not voting. We're "not that kind of club". Part
of the open source transition planning is setting up an architectural
process in which Interbase the Corporation (Ann's World) does not
have veto power over architectural decisions. A useful topic
for discussion.

>> I think you would be better off with a super-server (which never,
>> ever needs to lock) and fast IPC than the architectural overhead
>> of a clustered environment. Remember that the main reason that
>> DEC implemented clusters is that they couldn't design a fast
>> machine. Change the mix of cpu speed to disk bandwidth and
>> clusters (for databases) lose their appeal. At least to me.
>> Probably.
>It's more a question of price/performance. If I can use 6 dual CPU
>machines in a cluster it's a lot more cost effective than a
>single 8 CPU machine. That's not always possible even in HPC
>applications but it's nice if you can get it.

Do remember that the goal of a database designer is to bottleneck
at available disk bandwidth. Disk are the only part of computing
that hasn't gotten significantly faster (ok, transfer rates are
up, but rotational delay and seek times are at best 2X what they
were when you were born). As long as there are enough CPU
cycles to saturate the disk more doesn't make it faster (given
classical database architecture). Modern uniprocessors are
unbelievably fast; multi-processors boggle the mind. A cluster
architecture is just not needed to supply the cycles to clog
the disk channel. A cluster architecture does impose a signicant
tax on every lock, read, and write, so the net performance gain
on a cluster for a well written* database system is likely to
be negative. Sorry.

* "well written" means not Oracle. If you use hundreds and hundreds
of programs for years and years to write a database system, you
can build something that requires infinite cpu cycles to eat
the disk bandwidth. But there are cheaper ways to write bad
software. Just my opinion.

Jim Starkey