firebird-architect - RE: [IB-Architect] 'Order By' Issue

Subject	RE: [IB-Architect] 'Order By' Issue
Author	Leyne, Sean
Post date	2000-12-20T17:57:25Z

Jason,

I'm afraid that either you missed my point or I wasn't clear...

First, my response was based on the client being a web browser and the
navigation refers to paging through the resulting web pages.

That being said, we have, in fact, implemented an extended version of
this approach in a very extensive application, it provides sub-second
synchronization as well as data consistency coordination/management.

I did not expect that the middle-ware/application server would be doing
any large scale database caching. Thus, there would be no need to
maintain the data synchronized, thus the number and volume of database
changes wouldn't matter.

I would, however, cache the result set for the client search request, a
each new query criteria would purge the last client result set.

Using this solution any and all navigation would be performed the
middleware's cached result set. No server operation would be required,
above performing and returning the initial SELECT results.

To use the SELECT TOP {X}, {Y} syntax (as mySQL supports) would in fact
require that the database server re-perform the query and then extract
the appropriate result set. Thus, each navigation would result in
extensive database server activity.

Yes, using a middleware solution would require a single database request
to satisfy subsequent all client navigation requests. Equally, the
client would only "suffer" performance hit of the initial query, all
subsequent navigation would be "instantaneous".

Finally, if the issue of a huge result set is an issue (your 1 million
rows) then I would use the TOP {x} (where x is the maximum size) to
govern/limit the size of the result set to an "acceptable" overall size.

By the way, who said the application server isn't running on the same
box as the database? Also, if it's not, I'm sure that a 100mb or 1Gb
network connection could make the issue of the transfer the result set
to the middleware moot.

Sean

-----Original Message-----
From: Jason Wharton [mailto:jwharton@...]
Sent: Wednesday, December 20, 2000 12:21 PM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] 'Order By' Issue

Sean,

> With respect to your point about interactive/web applications. I
> personally have a problem with people who expect the DB server to be
> responsible for the navigation management you refer to. This should

> made the responsibility of a middle-tier solution (it doesn't need to

> complex...) to manage the result list and return the data as
> appropriate to the web server. There are numerous benefits to this
> approach, one of them is that the query is only performed once and the
> result persisted (in a Delphi CDS if you want) - no server
> overhead/action is then required to navigate the list.
>
> To expect the server to perform the navigation you want could require
> the query to be re-executed for each navigation - this promises to be

> extremely 'expensive' solution and would be a sure way to bring a

server

> to it's knees, not to talk about putting a Client session to "sleep"

The server isn't navigating. It cannot do such a thing. It is a set
based
SQL engine. What he wants to do is surgically grab just the records of
interest and ignore having to "navigate" among lots of records there is
no
actual interest in.

> Also, let's not forget that the only way to determine the size of a
> result set is to navigate to the end of the set.

His point is he wants a system that works independent of the size of the
dataset. IOW, he doesn't care how big it is.

> I realize that mySQL does provide some of the functionality you are
> referring to, but then again I don't think that it comes a mile close

> being a real SQL server product.

With all due respect your middleware solution for this would be fraught
with
quirks and problems if someone had any significant volume of data. I
think
it would be far superior to have a database that works as suggested than
going through all the hoops of maintaining essentially a replicated
version
of the database in virtual ram on another (or the same) machine.

What if the data is changing frequently? Requery and store 1 million
records
in a CDS on another machine? The thought of that makes me nauseous.

Or, execute a simple query with a WHERE and ORDER BY clause on a server
fetching just a few records? I can easily think of most high volume
systems
not even being able to keep the middle tiers up to date, let alone
serving
out any data.

FWIW,
Jason Wharton
CPS - Mesa AZ
http://www.ibobjects.com

----- Original Message -----
From: "Leyne, Sean" <InterbaseArchitecture@...>
To: <IB-Architect@egroups.com>
Sent: Wednesday, December 20, 2000 9:02 AM
Subject: RE: [IB-Architect] 'Order By' Issue

> David,
>
> First let me say that I believe that IB/Firebird needs to support the
> TOP {x} syntax and your top 10/100 example is right on point and is a
> real shortcoming for IB/Firebird, but...
>
>
>
> Sean
>
> -----Original Message-----
> From: David Berg [mailto:DaveBerg@...]
> Sent: Tuesday, December 19, 2000 9:57 PM
> To: IB-Architect@egroups.com
> Cc: 'Swor, Jim'
> Subject: RE: [IB-Architect] 'Order By' Issue
>
> RE: Why partial result sets are important.
>
> (1) Interactive applications, including web based applications. If
the
> user
> accidentilly requests 1,000 records you only show him the first 20.
If
> he
> really wants the entire 1,000, we can go back for the rest. If he
> requests
> 1,000,000 then we can be pretty sure he'll never want the entire list,
> but
> he may still want to look at the top 20 (or 100), particularly if the
> data
> is sorted so that the most meaningful items are at the top.
>
> If I have to extract and sort the 1,000,000 that's a lot slower than
> using
> an index to retrieve the first 100 and then stopping.
>
> Note that in this case, we don't always know how many records the
user's
> going to want to see, we just know that it's not going to be a lot.
> However, since we don't generally like keeping open cursors in this
> multi-tier / stateless world, it would be acceptible to let us say TOP
> 20,
> and have that cue the optimizer to use the index to retrieve the
records
> (because it's going to be a lot faster if we can determine the TOP 20
> records first, and then pull the rest of the data).
>
> We're doing work with Delphi's ClientDataSets right now, to try to
> manage
> retrieving records in packets, because we have real problems when
people
> request too many records without realizing what they're asking for.
>
> We try to clue them into the problem, but realistically, it's hard -
> because
> we really don't know how big the result sets going to be. We could
> execute
> every query twice - first with a COUNT(*), and then again going after
> the
> result set. That way if the count was too great we could prompt the
> user,
> but that's (nearly) doubling the time it would take to open every form
> (I
> realize that a COUNT(*) query generally executes much faster than just
> getting the data, the performance hit would be worse on small data
sets,
> or
> on complex queries where just figuring out what's in the result set
> takes a
> long time).
>
> (2) Analytic applications, where we want to look at our top 10
> customers, or
> top 100 selling products.
>
> -----Original Message-----
> From: Jim Starkey [mailto:jas@...]
> Sent: Wednesday, December 06, 2000 10:35 AM
> To: IB-Architect@egroups.com
> Subject: Re: [IB-Architect] 'Order By' Issue
>
>
> At 05:49 PM 12/6/00 +0100, Ivan Prenosil wrote:
> >>
> >> It is generally (i.e. always) faster to make a sequential pass
> >> through selected records and sort the results than to bounce
> >> between the index and the data pages -- a quicksort, even with
> >> a merge, is faster than a page read.
> >
> >Unfortunately "generally" does not always mean "always".
> >
> >E.g. you have wide rows and ony one row fit on page
> >(perhaps not typical table, but good for this example);
> >then ordering by index means that you will have to read
> >index (only small amount of data), and then each data page
> >only once. On the other hand using sort files means (approximately)
> >that each row is read from datapage, written to sort file,
> >read from sort file, i.e. 3 times more i/o operations.
> >
>
> There are two problems with this analysis. First, all IO
> is not the same. Large sequential operations are much faster
> per byte than successive ordered reads and successive ordered
> read are a great deal faster than random access. The index
> retrieval scheme through an intermediate bitmap guarentees
> that records are fetched as close to successive ordered reads
> possible. Sort runs are written as single writes 64KB (or
> whatever -- should be a MB). The cost of writing the sort
> run is probably less than reading two random pages. Second,
> if the sort buffer doesn't overflow, nothing gets written at
> all. Also keep in mind that sort doesn't copy the engine
> record, only fields referenced.
>
> >Retrieving only part of result set is also important,
> >especially with internet applications.
> >
>
> Could you explain this, please?
>
>
> Jim Starkey
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>
>
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>

To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com