firebird-architect - Re: User name SYSDBA

Subject	Re: User name SYSDBA
Author	johnson_dave2003
Post date	2005-08-11T03:42:09Z

Sorry ... I didn't mean to ignore this posting. I've got a huge
install coming up, and i'm short in the sleep and coffee departments.

This is sliding a bit to the direction of being off topic, and I want
to be sure that I give your observations a fair shake.

We are getting beyond my limited exposure to LDAP and LDAP like
systems. If we need to get more technical than this reply, I will
need to do some serious reading and experimenting first.

--- In Firebird-Architect@yahoogroups.com, Geoff Worboys <geoff@t...>
wrote:

> Yeah, OK. I see that dynamic security assessment is one way
> of having connection pooling work while keeping authorisation
> definition centralised in the database.

In terms of features, an LDAP like security module buys you the
following features effectively and efficiently:

1. separation, and almost divorce, of security model from database
engine concerns
2. elimination of "backdoor" security holes - isql can't break security
3. elimination of application programmer error as a cause of security
breaches
4. seamless, secure, high performance connection pooling
5. superior scalability
6. simple engine maintenance
7. something like Oracle's Virtual Private Databases, only superior

>
> This requirement comes about because, at the moment, a
> "connection" represents both a users session with the database
> and also the communications session with the server process.
>

It is also a convenient programming paradigm that makes code clear and
easy to maintain (reliable). By tying a socket connection to an OS
level thread of execution, transaction processing is essentially:

void socketConectionThreadRun ()
{
while (connected)
{
request = receive();

switch (request)
{
case START: results = startTransaction ();
case EXEC_STATEMENT: results = executeStatement ();
case COMMIT: results = commit ();
case ROLLBACK: results = rollback ();
else
}

send (results);
}
}

This style of code can be readily explained to and maintained by
anyone (except possibly Borland, apparently).

The Windows event model is much harder to follow. I had no problem
writing a single threaded event driven mass migration tool on top of
the TWSocket FTP connection - I really just tied the socket input
wrapped OS call) to the socket output (wrapped OS call) and let it
rip. The code was less than 60 lines of delphi, clearly laid out, and
well documented, but no one else could figure out how it worked. We
drove our wire connection to our business partner at 110 percent of
its rated bandwidth, as measured by our LAN tech, and moved 8TB of
data in three days. It worked, it was clean, but it was no obvious
unless you understood Windows.

The Interbase model of allowing concurrent queries on one connection
is, last I heard, not a preferred approach for reliable systems.

Back to the business at hand:

When extended to allow for lookasides for an LDAP-like system, the
example thread mainline code becomes (very crudely):

void socketConectionThreadRun ()
{
while (connected)
{
request = receive();
security.authorizeRequest (request);

switch (request)
{
case START: results = startTransaction ();
case EXEC_STATEMENT: results = executeStatement ();
case EXEC_SELECT: results = executeSelect ();
case COMMIT: results = commit ();
case ROLLBACK: results = rollback ();
else
}

send (results);
}
}

executeSelect (request)
{
preparedStatement = prepareStatement (request);

security.authorizeStatement (request, statement);

while (statement.hasNext ()) {
rowBuffer = statement.fetchRow ();

rowBuffer = security.authorizeRow (request, rowBuffer);
if (rowBuffer != null)
{
result.append (rowBuffer);
}
}

preparedStatement.close ();
return result;
}

Notice that the security module implementation is entirely unknown to
the engine. Even if we drill into executeSelect (), the concept of
security is known to the DBMS, but the implementation of the security
is foreign - it may be internal to the process or delegated via a
sockets connection, or you may have an elf with a tiny keyboard making
decisions based on tarot cards.

In a PAM implementing SQL92 authorization, only authorizeStatement
(from this example) would be implemented. All others would return
immediately, with a cost of roughly 70ms per 1,000,000 calls.

A PAM implementing internal LDAP-like structures or cached LDAP
authorizations would perform rapid lookups against a hash map or
(ideally) an integer indexed array of bytes.

A PAM implementing lookaside to a separate database (LDAP or the
current security database) would establish connections and perform
lookups and probably cache as much as it can in memory.

To the RDBMS, all of these PAM's appear the same, and the RDBMS is
completely unaware of which security model is in use.

> It seems to me that a better way of expressing a solution to
> the n-tier scaling problem is to separate the two concepts.
>

Agreed. LDAP like authentication and authorization does that. It is
an entirely separate issue from n-tier. As you point out so clearly,
the middle tier should appear to the level above it as a database
connection, and to the level below it as a database user.

>
> I will use the term "session" to represent a users session
> with the database and "connection" to represent a communication
> connection between two machines. Multiple sessions can run
> over a single connection. (Some discussion could be needed to
> get the terminology correct for best implementation.)
>
> - A user's client application would create a "session" request
> and pass it to the client library.
>
> - The Firebird client library would see that "session" request
> and, if it does not already have one, would create a
> "connection" to a server. That server may be a middle tier or
> an actual database server. The session request is passed to
> the server.
>
> - Middle tiers can act as pass-through or can do something
> more interesting with the traffic that comes through.
>
> - Database servers can have less communications connections
> while retaining a full list of active user sessions - from
> which authorisation etc can be managed. Presumably such
> session information will want to be light-weight, in order to
> easily manage a very large number of sessions.
>
>
> How this effects the caching of prepared statements etc I am
> not sure. I imagine that a large part of a prepare is the
> establishing of access rights - hence caching would be by
> session. This has resource implications so it may be decided
> to cache without rights, or cache rights information with the
> session and cache generic aspects globally. This is all
> implementation detail. The main point is that with a user
> session object it would become possible to validate a users
> rights at prepare time for a session because now the database
> "knows" about a given user session.

The problem here is that in many applications you have no way of
knowing when a session is done. If I decided not to submit this
email, my session would hang around using resources while a million or
more people are also maintaining sessions. If a session requires 1K
of resources to maintain, then we are talking about 1 GB worth of
memory for maintaining sessions on a few hundred connections. IMHO,
the 1Gb worth of memory is far better used for data buffers than
holding session information.

It is far more efficient for large scale apps to limit the "session"
to the transaction. The security look-aside would use cached data
when it was available, and perform physical I/O as required.

>
> Note also. Another reason for middle tiers is to offload
> various resources to distributed servers. It could be possible
> to create (trusted) cooperating servers that allow some of the
> caching to take place at the middle tiers while still retaining
> the central definition of rights etc. In such situations a
> client need never know whether it is talking to a middle tier
> server or not.

As soon as you have a "trusted" connection, you have a security hole.
Your worst enemy is not outside hackers, but (ironically) disgruntled
computer security administrators. Security = trust no one without
proper credentials. LDAP is all about addressing set of issues.

>
> Such a solution could also be of benefit to installations that
> do not use n-tier. For example I have an application that does
> make use of multiple database connections (sessions). If
> sessions become less expensive - by operating over the same
> connection to a server - then that application would benefit.

Establishing connections is expensive. Using existing connections is
cheap. Programming a 1 connection -> 1 thread model is easy,
especially when the security questions are properly delegated to a
peripheral API. Programming a 1 conenction -> multi-thread model is
more complex (reliability and maintenance problem)

Maintaining session information in the database presumes that users
stay connected, and that it is worthwhile to keep the information
about a logged in user in memory when they may not be doing any work.
Limiting session information to machine, user, and role, with the
security questions delegated outside of the core of the DBMS means
that no session level information _needs_ to be kept under any
circumstance.

The security look-asides will use whatever is cached first, followed
by whatever it needs to get through I/O. If the mechanism is actually
implemented on top of the Firebird database, its caching and
performance will be the same as you would expect for other internal
operations in the database.

By not maintaining session level information in the database, we gain
the ability to serve those 1,000,000 concurrent users without
increasing our resource requirements appreciably. We might be slow
with that kind of workload, but we will not run out of resources.

>
> Note that establishing a "session" could/would carry information
> about the end user machine (address details etc) so that the
> servers can report, and perhaps perform validation etc, against
> such details. This ability would be lost if n-tier becomes a
> simple pool of connections.

The wire protocol must allow for proxying and identification of the
originator of the requests. But I don't think that the session needs
to be (or should be) maintained at the RDBMS level beyond the duration
of individual transactions.

N-tier proxies can help to expand the connection pools and distribute
workloads. They are a benefit to the scaling of applications, but
they do not substitute for a well-defined and robust security model.