firebird-architect - Re: [IB-Architect] Messaging API

Subject	Re: [IB-Architect] Messaging API
Author	Jan Mikkelsen
Post date	2000-05-18T23:52:52Z

Jim Starkey <jas@...> wrote:

>At 06:11 PM 5/18/00 +1000, Jan Mikkelsen wrote:
>>Let's look at the requirements a bit.
>> ...
>>- A reliable broadcast message service. Client applications get an
>> unbroken and ordered sequence of messages from a given point.
>
>I think we can narrow the scope by asserting the following:
>
> 0. The purpose of messaging is to allow a connected program
> to track database state.
> 1. All messages originate from the server.
> 2. Messages are only delivered to clients with a connection
> to the server, and only messages posted during the
> connection are candidate for delivery.
> 3. In all support environments, a second virtual circuit
> for message/event delivery can be established without
> undo cost.
> 4. There is no requirement for recovery of a message stream
> is the database connection is broken for any reasons.

Ok, removing recovery as a requirement makes things easier. However, if
there if the messages are being delivered to each client over independent
reliable streams, the protocol stacks are doing the message recover for you,
on a per client basis. Coming from that basis, your reduced requirement for
recovery means less.

>Do we agree so far? If so, I think we can duck the issues of
>broadcast technologies and reliable delivery.

Removing the requirement for correctness always makes things a bit easier
(ie: no recovery), but I think broadcast is important. Using broadcast
instead of individual reliable streams makes recovery more important.

But overall, I think you are look only at a subset of the problem, although
your requirement 0, as stated, is essentially the full problem.

My reading is that you assume each client is only interested in a small,
infrequently changed part of the database, and there is little overlap
between the interests of clients. With this I disagree, and I think that
making that assumption will lead to a non-scalable design.

I think I saw a stock trading example being used a while ago on this thread.
I'm not suggesting that Interbase will displace NonStop-SQL as the database
of choice for exchanges, but the application is interesting none the less.
So, let's take a look at what really happens in a stock exchange
environment. This is roughly what the trading floor of the Stock Exchange
of Hong Kong looks like (from memory, it's been a little while):

~950 traders enter bid and sell orders, and trades occurs when bid and sell
orders match. The system is sized for >100000 trades per trading day of 4
or 5 hours. For each match, there are multiple bid and sell orders. All
brokers need to be fairly informed of the current bids, sells and matches.
Brokers run a trading application on a PC in front of them, which stores a
current view of the market in memory, based on messages received from a
Tandem system.

Fairness is important, because if one trader get information about the
market before another, he can get a financial advantage over the unlucky
trader who was the last to know.

While the application is large and outside Interbase's target market, the
model is still useful. Anyway, on to the specific points.

>>Higher Level Naming
>>
>>At the application level, I would expect a database object for the
>>message stream to be created. An application would subscribe to a
>>message stream and be given the listening details by the server.
>
>Declaring a server "channel" is a possibility, and would be a place
>to apply security constraints. A message, then, would be posted
>to a specific channel. A client, similarly, would request connect
>to a channel. The advantages of named channels are twofold: 1)
>They naturally subset messages, and 2) They provide a security hook.
>I don't think the concept would add much to an implementation or
>runtime cost. Is the conceptual complexity worth the gain?

It is a clean model. Is the alternative cleaner? I don't think so, at
least from what I have seen so far. Why do you think this model is
conceptually more complex?

>>If filtering on a stream is to be performed, it should be done at the
>>client, otherwise detecting gaps becomes problematic.
>
>Here I disagree. In general, each client will have a specific interest
>in a very limited corner of a database. Sending all messages to all
>clients is both expensive and unnecessary. In the existing event
>mechanism, posted events with no listeners are almost free, encouraging
>fine granuality, highly specific (and therefore efficient) event. A
>mechanism in which all messages were sent to all users would require
>the application designer to be very miserly in this message architecture,
>obviating some the benefits of the mechanism.

If clients will naturally be interested in particular corners of the
database, the designer should use multiple channels, and the clients should
subscribe to the ones that interest them. How things should be split
depends on the nature and scale of the application in question.

Where the filtering is done is probably an implementation issue. For a
broadcast implementation, broad filtering should be done on the basis of a
channel, with further filtering by the client. For a directed model, the
server should do all the filtering.

>Like the event mechanism, a message for which there are no listeners
>should be very, very cheap.

A message queue with no listeners is also cheap. Nobody does anything.

>[None of this should be construed to indicate that I think messaging
> is a good idea. Yet.]

I have designed and/or implemented a few systems which do things like this,
and they are useful. There are lots of implementation issues.

>>Message Structure
>>
>>I have deliberately left the message structure undefined. I think the
>>message structure should either be defined in terms of native Interbase
>>types, or left as a blob. The actual representation should be machine
>>readable and small. Please, no mandated ASCII, and especially no XML!
>>
>
>No structure, no server side filtering, and the mechanism is too
>expensive to use. May I remind you that ASCII means American
>Stanard Code for Information Interchange? We're in the information
>interchange business, so we should use the applicable standards.

I've missed something: exactly how is the mechanism too expensive to use?

Clearly if the information to be sent is ASCII, it should be ASCII. But if
the information to be communicated is a number, there is no need to send an
ASCII representation of the number. Why shouldn't I be able to declare a
message queue in terms of the types of used by the application (domains and
all), and get type safety? Why should I have to go through contortions to
represent my data in ASCII if that isn't its natural form?

>>Security
>>
>>Submitting messages to the queue would require a database connection with
>>appropriate permissions. If you really care about who listens or about
>>message authentication, encrypt and/or sign the messages.
>
>Surely you agree that access control is apppropriate for database
>tables. Message streams, though transient, carry information. If
>the granularity of access control where the database, then there
>would be no reason to control access to messages. But since we
>support a model where a client is allowed to see salary data,
>for example, does it make sense to allow him to see messages
>containing changes to salaries? I don't think so. If a message
>facility is going to be useful, it has to support the same level
>of security as the tables from which the message content is derived.

I agree. That's what I meant by "appropriate permissions". Sorry if I
wasn't clear.

How I see things: The directed model requires that server memory, server
processing and network bandwidth all grow linearly with the number of
clients, although protocol design issues are delegated to the underlying
network stack, superficially making the implementation look simpler. With
the broadcast model you get scalability at the expense of implementation
complexity.

Is that a fair summary?

Jan Mikkelsen