Subject Re: [IB-Architect] Classic vs. superserver (long, very very long)
Author Ann W. Harrison
At 12:57 AM 10/9/2002 +0400, Nickolay Samofatov wrote:

>There are ways of efficent interprocess communication.
>Have you read Linux or WinNT kernel specs ?

No, I haven't. The overwhelming problem with communication
in the classic architecture, at least in my experience, is
not the cost of IPC, but the cost of making it work on all
the variants that appear in such disparate operating
systems as Xenix, VMS, AIX, SCO, the Cobalt subset of Unix
et al. If we limit ourselves to Windows, Linux, and MacOS,
maybe that problem goes away.

>I would fix FB2 threading model but now I have a lot of
>work on my direct projects. I fixed FB2 enough to suit my
>project basic needs.

I'd very much like to explore what you've done.

>What I think needs to be done is to start from CS and make
>it use resources better (and minimize syscalls). Then
>analyze its code to make it thread-safe and rewrite
>y-valve using SS code as an example to implement
>multi-threaded multi-process model.

Could you explain multi-threaded multi-process a bit more?
Currently that's an either/or choice. I can imaging a multi-
threaded version of classic that keeps the process per
connection but adds separate threads to allow parallel I/O
operations, sorts, etc. Is that what you mean?

>It scales. I tested it hard. FB1 server i configured
>handles load from 3000 concurrent users via ~100 active pooled
>connections 10 hours a day 7 days a week in production environment
>for more than a year on a 8-way SMP machine.

That sounds great... How do you manage security using connection
pools? Does that generalize? Clearly, maintaining open
connections pretty much eliminates the need for a shared
metadata cache. That's only important if you've got processes
(connections) that exit and restart.

>I haven't tested FB2 SS, but FB1 SS and IB6 SS die and
>corrupt database five times a day under under simular load.

Interesting. I thought there were sites using about that
load - meaning about 100 connections shared among many more
users - with SS. What sort of data corruption did you get?

>As I remember that configuration - buffer cache was essentialy
>disabled. OS file cache was huge (~2 GB, but much less than database
>size). No IO bottlenecks where encountered at all, notwithstanding
>that it was hybrid OLAP/OLTP database.

Ah. OK. Were you using asynchronous writes? If not, then
the OS file cache was serving as a shared page cache - why
not?? (Except that there are operating systems that don't
offer file caches, but hey, they're pretty much obsolete...)

> > A catastrophic error in a single connection doesn't
> > affect other connections, while a server crash brings down
> > all connections. The server shouldn't crash.
>It will crash sometimes. Just because shit happens.

OK. But in your architecture, if a certain kind of shit
happens (failure between the file system cache and the
disk) you lose everything.

>If you kill one process from Oracle process pool probally
>nobody will notice the result. This is the way large DBMS

That's not the way either Sybase or MS-SQL works. And on
Windows systems, DB2 using a thread-per-connection model.

>I repeat, IPC should be _carefully_ designed based on
>all features modern OS offer - shared memory, pipes, message queues,
>semaphores, signals, memory protection, etc.

No question. And as you track through the various bits and pieces,
you'll find that classic uses shared memory - though there was an
implementation once for a system that didn't have it - and pipes,
though pipes have problems when you need to interrupt the guy
at the other end - and message queues, semaphores, signals and
mapped files. None of that stuff is new. It's just a lot of
variants to deal with if the alternative is moving everything
into one process.

We agree, I think, that maintaining two architectures as different
as superserver and classic is a problem. The question is which
architecture offers the best performance in the general case. And,
of course, that requires that we agree on what is the general case.

>This features
>are available on most platforms (just have a look at Oracle list of
>supported platforms and at syscalls trace of features it uses).

And if we had Oracle's funding - if we even had the funding that
Oracle has put into its America's Cup Challenge - we too could
handle two architectures.

>I think acceptable model is combined multi-process multi-threaded
>server. This is optimal for some platforms (like Windows)
>which have optimized threading and is good for stability.

I'd like that better if it didn't require a pool of connections
and didn't violate the requirement for careful writes.

> > Borland has found a way to kill runaway queries
>One flag checked in a couple places and exception raised.

Right. There is the precondition of being able to identify
the runaway thread, but that too is SMOP.

> > Yes, we could add shared page and metadata caching to classic,
> > but that's redoing work that's already been done in superserver,
>Not that simple. Metadata cache needs not to be shared.

Not if you maintain a connection pool - the cost of recreating
the metadata image on each new connection is considerable.

>1)Add version for each cached object.


>2)Allocate write-protected area in the shared memory to store
>object versions associated with objects.

Sure, but ... isn't that just a variant of a share metadata
cache - the larger part is kept in the process, to be sure,
but the idea is the same. In your case, the cost is lower
because the connections are stable. Aside from the fact
that shared memory used to be a scarce resource, why not put
the whole metadata caches there?

>3) During normal operation do lookups and compare local
>cached object version and version in the shared table.

That's actually where the problem occurs. Once a request
is compiled, there's no further checking that the assumptions
that went into it are still valid. Nor is there any check
on the validity of structures used by a running request.

If one process is running a query that depends on an index
and another deletes the index, the running query is going
to hit a wall when it reads what it expects to be an index
page and discovers that it has been released and reallocated
as a data page.

>I can describe simular methods of work with locks and buffer cache.
>Look at my algorothm - it keeps process shared data amount at minimum
>and protects it. This makes system robust and efficient in clustered

Clustered? Hmmm. OK. How do you communicate page locks
in a cluster?

>I designed and implemented several products people have to really rely
>on (like real-time medical database which works with life-support
>devices) and I know basic principles of it. They are:
>1) There should be no single point of failure
>2) Shit happens

From what I understand, your reliance on the file cache is a
single point of failure and endless shit is likely to happen
if someone drops an index on a running system. What have I

>Your model breaks both principles. It will not be reliable.

OK, lets work on making a model that is reliable.


We have answers.