Subject Re: [IB-Architect] Classic vs. superserver (long, very very long)
Author Nickolay Samofatov
Hello Ann,

Tuesday, October 08, 2002, 7:40:19 PM, you wrote:

> Here's my take on the issue of classic vs. superserver.

> First, I think that maintaining two architectures is a
> gross waste of everyone's time and a serious risk to the
> reliability of Firebird. There are over 500 ifdefs that
> separate the code paths for the two. But the amount of
> conditional code is much less important than the nature
> of that code.

> In the classic architecture, each connection is a process
> that includes the database software as a shared library.
> In Classic, different connections communicate in one of
> two ways: through the lock manager or through the disk.

> In superserver, connections are threads and can communicate
> through shared data structures. Handling the shared
> data structures requires interlocking, but it is much less
> expensive than signalling from one process to another or
> reading and writing pages.

[description of a couple easly-solvable issues skipped]

Yes, but you mentioned only IPC which involve syscalls. Modern
OS's allow other, very, very effective methods of interprocess

There are ways of efficent interprocess communication.
Have you read Linux or WinNT kernel specs ?

I would fix FB2 threading model but now I have a lot of
work on my direct projects. I fixed FB2 enough to suit my
project basic needs.
What I think needs to be done is to start from CS and make
it use resources better (and minimize syscalls). Than
analyze its code to make it thread-safe and rewrite
y-valve using SS code as an example to implement
multi-threaded multi-process model.

> Certainly classic has three well known advantages: SMP,
> failure safety, and the ability to kill a runaway process.

> It's a cheap way to get SMP - but it doesn't scale. In
> particular, classic puts more load on the I/O channels -
> there comes a point where the database is simply I/O bound
> and adding more processors isn't going to help.

It scales. I tested it hard. FB1 server i configured
handles load from 3000 concurrent users via ~100 active pooled
connections 10 hours a day 7 days a week in production environment
for more than a year on a 8-way SMP machine.

I haven't tested FB2 SS, but FB1 SS and IB6 SS die and
corrupt database five times a day under under simular load.

As I remember that configuration - buffer cache was essentialy
disabled. OS file cache was huge (~2 GB, but much less than database
size). No IO bottlenecks where encountered at all, notwithstanding
that it was hybrid OLAP/OLTP database.

> A catastrophic error in a single connection doesn't
> affect other connections, while a server crash brings down
> all connections. The server shouldn't crash. Ever.
> Earlier versions of InterBase did not isolate UDF's, and
> allowed user code to crash the server. That's been fixed
> for nearly four years. Now, it's up to us to find and
> fix the places in the server where bad inputs can cause
> crashes.

It will crash sometimes. Just because shit happens.
If you kill one process from Oracle process pool probally
nobody will notice the result. This is the way large DBMS
designed. I repeat, IPC should be _carefully_ designed based on
all features modern OS offer - shared memory, pipes, message queues,
semaphores, signals, memory protection, etc. This features
are available on most platforms (just have a look at Oracle list of
supported platforms and at syscalls trace of features it uses).

I think acceptable model is combined multi-process multi-threaded
server. This is optimal for some platforms (like Windows)
which have optimized threading and is good for stability.

> Borland has found a way to kill runaway queries in SuperServer -
> we can too.

It is relatively easy to do. One flag checked in a couple places
and exception raised.

> Yes, we could add shared page and metadata caching to classic,
> but that's redoing work that's already been done in superserver,
> and it still requires fine granularity locking if it's going
> to run well on an SMP system. What we'd be doing, if we tried
> to build from the classic base, is redoing all the work that's
> already gone into superserver.

Not that simple. Metadata cache needs not to be shared.

I actually know how to make it work w/o much overhead.
One way to do it is:

1)Add version for each cached object.
2)Allocate write-protected area in the shared memory to store
object versions associated with objects.

A couple of syscalls during first adding of object to
read-only shared table (on first read) and during DDL will not change
anything. This calls are not required if we make shmem read-write, but
they will make it more robust in case of crash - there will be no way
to modify it exept doing special syscall which is very, very unlikely.

3) During normal operation do lookups and compare local
cached object version and version in the shared table.

I can describe simular methods of work with locks and buffer cache.
Look at my algorothm - it keeps process shared data amount at minimum
and protects it. This makes system robust and efficient in clustered environment.

I designed and implemented several products people have to really rely
on (like real-time medical database which works with life-support
devices) and I know basic principles of it. They are:
1) There should be no single point of failure
2) Shit happens

Your model breaks both principles. It will not be reliable.

> Regards,

> Ann
> We have answers.

Best regards,
Nickolay Samofatov mailto:skidder@...