Subject RE: [IB-Architect] 'Order By' Issue
Author Leyne, Sean
Jason,

Obviously, we do have very different scenarios in mind.

Sean

-----Original Message-----
From: Jason Wharton [mailto:jwharton@...]
Sent: Wednesday, December 20, 2000 10:51 PM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] 'Order By' Issue

Sean,

> I'm afraid that either you missed my point or I wasn't clear...

I think we must have different scenarios in mind.

> First, my response was based on the client being a web browser and the
> navigation refers to paging through the resulting web pages.
>
> That being said, we have, in fact, implemented an extended version of
> this approach in a very extensive application, it provides sub-second
> synchronization as well as data consistency coordination/management.

Sounds spiffy. Shall we call it MiddleBase for InterBase?

> I did not expect that the middle-ware/application server would be
doing
> any large scale database caching. Thus, there would be no need to
> maintain the data synchronized, thus the number and volume of database
> changes wouldn't matter.
>
> I would, however, cache the result set for the client search request,
a
> each new query criteria would purge the last client result set.

So, a single hit request is going to pick up a pseudo snapshot of the
data
from an all encompassing cached result set on a middle tier machine?

I can see how this could work for some situations but I can also think
of
some possible traps that would be a pain to work around to avoid getting
bit
by. Not to mention all of the additional hardware and additional points
of
failure introduced.

> Using this solution any and all navigation would be performed the
> middleware's cached result set. No server operation would be
required,
> above performing and returning the initial SELECT results.

For every request that comes in you make the database server do all the
work
that the user might make it do in subsequent requests? And on top of
that,
all of this data is potentially crossing a network connection to boot?
IOW.
On the first hit you send all of the data necessary for any subsequent
hits
to the middle tier. This seems like it would be horribly inefficient to
me.

Are your users consistently paging through all records potentially
served up
from the cache? If they are like me, they do not scroll through the
remaining pages of records so the middle tier cache is pretty much
useless
in 90%. What is your utilization of the cached information? Are you able
to
send off the first page before the rest of it is accumulated?

I thought a middle tier based system would primarily be for pooling
sessions
to allow efficient "mutexed" access to a database server protecting it
from
getting blown out of the water by too many simultaneous connection
requests.
For example, lets say you have a potential of 5,000 clients during a
typical
work day. There is nothing that would prevent them from all attempting
to
connect at the same time. If this were to happen it would kill InterBase
for
sure. Keep in mind that InterBase allocates RAM for the server process
based
on the number of connections and page size to the database (which is
awful
if you ask me. I think it should be smarter than that).

So, a person with a potentially huge number of clients would look to a
multi-tier system to cache the connections and queue things up such that
it
would serialize the requests through a fixed number of sessions to a
database to avoid overload and also distribute some of the other
business
processes associated with server-side operations.

Thus, if these 5,000 users all wanted to know the current quote for
Dreyer's
Ice-Cream, etc. their requests would stack up and get funneled through
lets
say 5 machines chained together for load balancing. If each machine
handled
40 connections and stacked up all requests to go through this then this
means that there is a possibility of 200 simultaneous connections. Thus,
each connection would have 50 hits to process. If each hit was handled
in
250 milliseconds that means that all 5,000 requests could be handled in
about 12 seconds. That's not too bad. Especially since replication isn't
even brought into the picture yet. On the right hardware/platform
InterBase
could do this all day long.

But, never had I thought of a middle tier system as a complete cache of
data
for potentially large operations... I think your usage of multi-tier is
getting on the "reinventing the wheel" territory.

> To use the SELECT TOP {X}, {Y} syntax (as mySQL supports) would in
fact
> require that the database server re-perform the query and then extract
> the appropriate result set. Thus, each navigation would result in
> extensive database server activity.

You seem to be missing the point of what is being suggested in general
(by
me at least). I don't want the ability to do just what MySQL does. I
don't
care about the number of records and I certainly don't want to tell it
to
re-execute the exact same query just to get some records further down.
That's sloppy in my book.

I just want InterBase to use a pathway that is optimized for selecting
records at the TOP of the dataset. I know this isn't possible for all
cases
but a good database designer can generally maintain statistic tables
that
makes it so that most complex queries can execute very quickly and
accurately from maintained tables. It's not like I have a prompt asking
for
the user to type their own SQL in the web-browser. I know what needs to
be
delivered and I maintain the tables, etc. to deliver it quickly. It just
seems that sometimes InterBase fights me because it is so batch minded.

In your case you don't need this because you are just shipping the whole
result set over to the middle tier so you have no need for quickness of
the
TOP records of the dataset. I think this is where your hang-up is.

> Yes, using a middleware solution would require a single database
request
> to satisfy subsequent all client navigation requests. Equally, the
> client would only "suffer" performance hit of the initial query, all
> subsequent navigation would be "instantaneous".

Ok, yes, this confirms my point. I think this is undesirable and fraught
with potential traps.

> Finally, if the issue of a huge result set is an issue (your 1 million
> rows) then I would use the TOP {x} (where x is the maximum size) to
> govern/limit the size of the result set to an "acceptable" overall
size.

Like I said, I'm not talking about TOP x. I'm talking about telling
InterBase to be quick about getting the TOP records because when the
user is
going to navigate to the next page of items to view I am going to be
executing a query with a new input parameter value and returning just
the
records of interest. I'm not executing the same query again. In fact,
that's
impossible because one hit to the next is not going to be in the same
transaction. It cannot be rownum based but instead it has to be based on
the
data itself to be truly accurate. Unless of course you are storing the
results in a results table and then you are left joining the pages of
information.

> By the way, who said the application server isn't running on the same
> box as the database? Also, if it's not, I'm sure that a 100mb or 1Gb
> network connection could make the issue of the transfer the result set
> to the middleware moot.

I didn't say it was but it would seem rather absurd to have a middle
tier
system all setup just to run on a single machine. Doesn't that defeat
one of
the main benefits to having middle tier services is the ability to run
them
on multiple machines?

PS. Thanks for letting me rattle you around a bit here. I think this is
an
area that many have questions about so our little spats are bound to do
dome
good. <g>

FWIW,
Jason Wharton
CPS - Mesa AZ
http://www.ibobjects.com



To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com