Subject RE: [IB-Architect] 'Order By' Issue
Author David Berg
RE: Why partial result sets are important.

(1) Interactive applications, including web based applications. If the user
accidentilly requests 1,000 records you only show him the first 20. If he
really wants the entire 1,000, we can go back for the rest. If he requests
1,000,000 then we can be pretty sure he'll never want the entire list, but
he may still want to look at the top 20 (or 100), particularly if the data
is sorted so that the most meaningful items are at the top.

If I have to extract and sort the 1,000,000 that's a lot slower than using
an index to retrieve the first 100 and then stopping.

Note that in this case, we don't always know how many records the user's
going to want to see, we just know that it's not going to be a lot.
However, since we don't generally like keeping open cursors in this
multi-tier / stateless world, it would be acceptible to let us say TOP 20,
and have that cue the optimizer to use the index to retrieve the records
(because it's going to be a lot faster if we can determine the TOP 20
records first, and then pull the rest of the data).

We're doing work with Delphi's ClientDataSets right now, to try to manage
retrieving records in packets, because we have real problems when people
request too many records without realizing what they're asking for.

We try to clue them into the problem, but realistically, it's hard - because
we really don't know how big the result sets going to be. We could execute
every query twice - first with a COUNT(*), and then again going after the
result set. That way if the count was too great we could prompt the user,
but that's (nearly) doubling the time it would take to open every form (I
realize that a COUNT(*) query generally executes much faster than just
getting the data, the performance hit would be worse on small data sets, or
on complex queries where just figuring out what's in the result set takes a
long time).

(2) Analytic applications, where we want to look at our top 10 customers, or
top 100 selling products.

-----Original Message-----
From: Jim Starkey [mailto:jas@...]
Sent: Wednesday, December 06, 2000 10:35 AM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] 'Order By' Issue


At 05:49 PM 12/6/00 +0100, Ivan Prenosil wrote:
>>
>> It is generally (i.e. always) faster to make a sequential pass
>> through selected records and sort the results than to bounce
>> between the index and the data pages -- a quicksort, even with
>> a merge, is faster than a page read.
>
>Unfortunately "generally" does not always mean "always".
>
>E.g. you have wide rows and ony one row fit on page
>(perhaps not typical table, but good for this example);
>then ordering by index means that you will have to read
>index (only small amount of data), and then each data page
>only once. On the other hand using sort files means (approximately)
>that each row is read from datapage, written to sort file,
>read from sort file, i.e. 3 times more i/o operations.
>

There are two problems with this analysis. First, all IO
is not the same. Large sequential operations are much faster
per byte than successive ordered reads and successive ordered
read are a great deal faster than random access. The index
retrieval scheme through an intermediate bitmap guarentees
that records are fetched as close to successive ordered reads
possible. Sort runs are written as single writes 64KB (or
whatever -- should be a MB). The cost of writing the sort
run is probably less than reading two random pages. Second,
if the sort buffer doesn't overflow, nothing gets written at
all. Also keep in mind that sort doesn't copy the engine
record, only fields referenced.

>Retrieving only part of result set is also important,
>especially with internet applications.
>

Could you explain this, please?


Jim Starkey

To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com