Subject Re: [IB-Architect] 'Order By' Issue
Author Jason Wharton
Sean,

> I'm afraid that either you missed my point or I wasn't clear...

I think we must have different scenarios in mind.

> First, my response was based on the client being a web browser and the
> navigation refers to paging through the resulting web pages.
>
> That being said, we have, in fact, implemented an extended version of
> this approach in a very extensive application, it provides sub-second
> synchronization as well as data consistency coordination/management.

Sounds spiffy. Shall we call it MiddleBase for InterBase?

> I did not expect that the middle-ware/application server would be doing
> any large scale database caching. Thus, there would be no need to
> maintain the data synchronized, thus the number and volume of database
> changes wouldn't matter.
>
> I would, however, cache the result set for the client search request, a
> each new query criteria would purge the last client result set.

So, a single hit request is going to pick up a pseudo snapshot of the data
from an all encompassing cached result set on a middle tier machine?

I can see how this could work for some situations but I can also think of
some possible traps that would be a pain to work around to avoid getting bit
by. Not to mention all of the additional hardware and additional points of
failure introduced.

> Using this solution any and all navigation would be performed the
> middleware's cached result set. No server operation would be required,
> above performing and returning the initial SELECT results.

For every request that comes in you make the database server do all the work
that the user might make it do in subsequent requests? And on top of that,
all of this data is potentially crossing a network connection to boot? IOW.
On the first hit you send all of the data necessary for any subsequent hits
to the middle tier. This seems like it would be horribly inefficient to me.

Are your users consistently paging through all records potentially served up
from the cache? If they are like me, they do not scroll through the
remaining pages of records so the middle tier cache is pretty much useless
in 90%. What is your utilization of the cached information? Are you able to
send off the first page before the rest of it is accumulated?

I thought a middle tier based system would primarily be for pooling sessions
to allow efficient "mutexed" access to a database server protecting it from
getting blown out of the water by too many simultaneous connection requests.
For example, lets say you have a potential of 5,000 clients during a typical
work day. There is nothing that would prevent them from all attempting to
connect at the same time. If this were to happen it would kill InterBase for
sure. Keep in mind that InterBase allocates RAM for the server process based
on the number of connections and page size to the database (which is awful
if you ask me. I think it should be smarter than that).

So, a person with a potentially huge number of clients would look to a
multi-tier system to cache the connections and queue things up such that it
would serialize the requests through a fixed number of sessions to a
database to avoid overload and also distribute some of the other business
processes associated with server-side operations.

Thus, if these 5,000 users all wanted to know the current quote for Dreyer's
Ice-Cream, etc. their requests would stack up and get funneled through lets
say 5 machines chained together for load balancing. If each machine handled
40 connections and stacked up all requests to go through this then this
means that there is a possibility of 200 simultaneous connections. Thus,
each connection would have 50 hits to process. If each hit was handled in
250 milliseconds that means that all 5,000 requests could be handled in
about 12 seconds. That's not too bad. Especially since replication isn't
even brought into the picture yet. On the right hardware/platform InterBase
could do this all day long.

But, never had I thought of a middle tier system as a complete cache of data
for potentially large operations... I think your usage of multi-tier is
getting on the "reinventing the wheel" territory.

> To use the SELECT TOP {X}, {Y} syntax (as mySQL supports) would in fact
> require that the database server re-perform the query and then extract
> the appropriate result set. Thus, each navigation would result in
> extensive database server activity.

You seem to be missing the point of what is being suggested in general (by
me at least). I don't want the ability to do just what MySQL does. I don't
care about the number of records and I certainly don't want to tell it to
re-execute the exact same query just to get some records further down.
That's sloppy in my book.

I just want InterBase to use a pathway that is optimized for selecting
records at the TOP of the dataset. I know this isn't possible for all cases
but a good database designer can generally maintain statistic tables that
makes it so that most complex queries can execute very quickly and
accurately from maintained tables. It's not like I have a prompt asking for
the user to type their own SQL in the web-browser. I know what needs to be
delivered and I maintain the tables, etc. to deliver it quickly. It just
seems that sometimes InterBase fights me because it is so batch minded.

In your case you don't need this because you are just shipping the whole
result set over to the middle tier so you have no need for quickness of the
TOP records of the dataset. I think this is where your hang-up is.

> Yes, using a middleware solution would require a single database request
> to satisfy subsequent all client navigation requests. Equally, the
> client would only "suffer" performance hit of the initial query, all
> subsequent navigation would be "instantaneous".

Ok, yes, this confirms my point. I think this is undesirable and fraught
with potential traps.

> Finally, if the issue of a huge result set is an issue (your 1 million
> rows) then I would use the TOP {x} (where x is the maximum size) to
> govern/limit the size of the result set to an "acceptable" overall size.

Like I said, I'm not talking about TOP x. I'm talking about telling
InterBase to be quick about getting the TOP records because when the user is
going to navigate to the next page of items to view I am going to be
executing a query with a new input parameter value and returning just the
records of interest. I'm not executing the same query again. In fact, that's
impossible because one hit to the next is not going to be in the same
transaction. It cannot be rownum based but instead it has to be based on the
data itself to be truly accurate. Unless of course you are storing the
results in a results table and then you are left joining the pages of
information.

> By the way, who said the application server isn't running on the same
> box as the database? Also, if it's not, I'm sure that a 100mb or 1Gb
> network connection could make the issue of the transfer the result set
> to the middleware moot.

I didn't say it was but it would seem rather absurd to have a middle tier
system all setup just to run on a single machine. Doesn't that defeat one of
the main benefits to having middle tier services is the ability to run them
on multiple machines?

PS. Thanks for letting me rattle you around a bit here. I think this is an
area that many have questions about so our little spats are bound to do dome
good. <g>

FWIW,
Jason Wharton
CPS - Mesa AZ
http://www.ibobjects.com