Subject Re: [IB-Architect] 'Order By' Issue
Author Jason Wharton
Sean,

> With respect to your point about interactive/web applications. I
> personally have a problem with people who expect the DB server to be
> responsible for the navigation management you refer to. This should be
> made the responsibility of a middle-tier solution (it doesn't need to be
> complex...) to manage the result list and return the data as
> appropriate to the web server. There are numerous benefits to this
> approach, one of them is that the query is only performed once and the
> result persisted (in a Delphi CDS if you want) - no server
> overhead/action is then required to navigate the list.
>
> To expect the server to perform the navigation you want could require
> the query to be re-executed for each navigation - this promises to be an
> extremely 'expensive' solution and would be a sure way to bring a server
> to it's knees, not to talk about putting a Client session to "sleep"

The server isn't navigating. It cannot do such a thing. It is a set based
SQL engine. What he wants to do is surgically grab just the records of
interest and ignore having to "navigate" among lots of records there is no
actual interest in.

> Also, let's not forget that the only way to determine the size of a
> result set is to navigate to the end of the set.

His point is he wants a system that works independent of the size of the
dataset. IOW, he doesn't care how big it is.

> I realize that mySQL does provide some of the functionality you are
> referring to, but then again I don't think that it comes a mile close to
> being a real SQL server product.

With all due respect your middleware solution for this would be fraught with
quirks and problems if someone had any significant volume of data. I think
it would be far superior to have a database that works as suggested than
going through all the hoops of maintaining essentially a replicated version
of the database in virtual ram on another (or the same) machine.

What if the data is changing frequently? Requery and store 1 million records
in a CDS on another machine? The thought of that makes me nauseous.

Or, execute a simple query with a WHERE and ORDER BY clause on a server
fetching just a few records? I can easily think of most high volume systems
not even being able to keep the middle tiers up to date, let alone serving
out any data.

FWIW,
Jason Wharton
CPS - Mesa AZ
http://www.ibobjects.com


----- Original Message -----
From: "Leyne, Sean" <InterbaseArchitecture@...>
To: <IB-Architect@egroups.com>
Sent: Wednesday, December 20, 2000 9:02 AM
Subject: RE: [IB-Architect] 'Order By' Issue


> David,
>
> First let me say that I believe that IB/Firebird needs to support the
> TOP {x} syntax and your top 10/100 example is right on point and is a
> real shortcoming for IB/Firebird, but...
>
>
>
> Sean
>
> -----Original Message-----
> From: David Berg [mailto:DaveBerg@...]
> Sent: Tuesday, December 19, 2000 9:57 PM
> To: IB-Architect@egroups.com
> Cc: 'Swor, Jim'
> Subject: RE: [IB-Architect] 'Order By' Issue
>
> RE: Why partial result sets are important.
>
> (1) Interactive applications, including web based applications. If the
> user
> accidentilly requests 1,000 records you only show him the first 20. If
> he
> really wants the entire 1,000, we can go back for the rest. If he
> requests
> 1,000,000 then we can be pretty sure he'll never want the entire list,
> but
> he may still want to look at the top 20 (or 100), particularly if the
> data
> is sorted so that the most meaningful items are at the top.
>
> If I have to extract and sort the 1,000,000 that's a lot slower than
> using
> an index to retrieve the first 100 and then stopping.
>
> Note that in this case, we don't always know how many records the user's
> going to want to see, we just know that it's not going to be a lot.
> However, since we don't generally like keeping open cursors in this
> multi-tier / stateless world, it would be acceptible to let us say TOP
> 20,
> and have that cue the optimizer to use the index to retrieve the records
> (because it's going to be a lot faster if we can determine the TOP 20
> records first, and then pull the rest of the data).
>
> We're doing work with Delphi's ClientDataSets right now, to try to
> manage
> retrieving records in packets, because we have real problems when people
> request too many records without realizing what they're asking for.
>
> We try to clue them into the problem, but realistically, it's hard -
> because
> we really don't know how big the result sets going to be. We could
> execute
> every query twice - first with a COUNT(*), and then again going after
> the
> result set. That way if the count was too great we could prompt the
> user,
> but that's (nearly) doubling the time it would take to open every form
> (I
> realize that a COUNT(*) query generally executes much faster than just
> getting the data, the performance hit would be worse on small data sets,
> or
> on complex queries where just figuring out what's in the result set
> takes a
> long time).
>
> (2) Analytic applications, where we want to look at our top 10
> customers, or
> top 100 selling products.
>
> -----Original Message-----
> From: Jim Starkey [mailto:jas@...]
> Sent: Wednesday, December 06, 2000 10:35 AM
> To: IB-Architect@egroups.com
> Subject: Re: [IB-Architect] 'Order By' Issue
>
>
> At 05:49 PM 12/6/00 +0100, Ivan Prenosil wrote:
> >>
> >> It is generally (i.e. always) faster to make a sequential pass
> >> through selected records and sort the results than to bounce
> >> between the index and the data pages -- a quicksort, even with
> >> a merge, is faster than a page read.
> >
> >Unfortunately "generally" does not always mean "always".
> >
> >E.g. you have wide rows and ony one row fit on page
> >(perhaps not typical table, but good for this example);
> >then ordering by index means that you will have to read
> >index (only small amount of data), and then each data page
> >only once. On the other hand using sort files means (approximately)
> >that each row is read from datapage, written to sort file,
> >read from sort file, i.e. 3 times more i/o operations.
> >
>
> There are two problems with this analysis. First, all IO
> is not the same. Large sequential operations are much faster
> per byte than successive ordered reads and successive ordered
> read are a great deal faster than random access. The index
> retrieval scheme through an intermediate bitmap guarentees
> that records are fetched as close to successive ordered reads
> possible. Sort runs are written as single writes 64KB (or
> whatever -- should be a MB). The cost of writing the sort
> run is probably less than reading two random pages. Second,
> if the sort buffer doesn't overflow, nothing gets written at
> all. Also keep in mind that sort doesn't copy the engine
> record, only fields referenced.
>
> >Retrieving only part of result set is also important,
> >especially with internet applications.
> >
>
> Could you explain this, please?
>
>
> Jim Starkey
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>
>
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com
>
>
>