Subject | RE: [IB-Architect] 'Order By' Issue |
---|---|
Author | Leyne, Sean |
Post date | 2000-12-20T16:02:15Z |
David,
First let me say that I believe that IB/Firebird needs to support the
TOP {x} syntax and your top 10/100 example is right on point and is a
real shortcoming for IB/Firebird, but...
With respect to your point about interactive/web applications. I
personally have a problem with people who expect the DB server to be
responsible for the navigation management you refer to. This should be
made the responsibility of a middle-tier solution (it doesn't need to be
complex...) to manage the result list and return the data as
appropriate to the web server. There are numerous benefits to this
approach, one of them is that the query is only performed once and the
result persisted (in a Delphi CDS if you want) - no server
overhead/action is then required to navigate the list.
To expect the server to perform the navigation you want could require
the query to be re-executed for each navigation - this promises to be an
extremely 'expensive' solution and would be a sure way to bring a server
to it's knees, not to talk about putting a Client session to "sleep"
Also, let's not forget that the only way to determine the size of a
result set is to navigate to the end of the set.
I realize that mySQL does provide some of the functionality you are
referring to, but then again I don't think that it comes a mile close to
being a real SQL server product.
Sean
-----Original Message-----
From: David Berg [mailto:DaveBerg@...]
Sent: Tuesday, December 19, 2000 9:57 PM
To: IB-Architect@egroups.com
Cc: 'Swor, Jim'
Subject: RE: [IB-Architect] 'Order By' Issue
RE: Why partial result sets are important.
(1) Interactive applications, including web based applications. If the
user
accidentilly requests 1,000 records you only show him the first 20. If
he
really wants the entire 1,000, we can go back for the rest. If he
requests
1,000,000 then we can be pretty sure he'll never want the entire list,
but
he may still want to look at the top 20 (or 100), particularly if the
data
is sorted so that the most meaningful items are at the top.
If I have to extract and sort the 1,000,000 that's a lot slower than
using
an index to retrieve the first 100 and then stopping.
Note that in this case, we don't always know how many records the user's
going to want to see, we just know that it's not going to be a lot.
However, since we don't generally like keeping open cursors in this
multi-tier / stateless world, it would be acceptible to let us say TOP
20,
and have that cue the optimizer to use the index to retrieve the records
(because it's going to be a lot faster if we can determine the TOP 20
records first, and then pull the rest of the data).
We're doing work with Delphi's ClientDataSets right now, to try to
manage
retrieving records in packets, because we have real problems when people
request too many records without realizing what they're asking for.
We try to clue them into the problem, but realistically, it's hard -
because
we really don't know how big the result sets going to be. We could
execute
every query twice - first with a COUNT(*), and then again going after
the
result set. That way if the count was too great we could prompt the
user,
but that's (nearly) doubling the time it would take to open every form
(I
realize that a COUNT(*) query generally executes much faster than just
getting the data, the performance hit would be worse on small data sets,
or
on complex queries where just figuring out what's in the result set
takes a
long time).
(2) Analytic applications, where we want to look at our top 10
customers, or
top 100 selling products.
-----Original Message-----
From: Jim Starkey [mailto:jas@...]
Sent: Wednesday, December 06, 2000 10:35 AM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] 'Order By' Issue
At 05:49 PM 12/6/00 +0100, Ivan Prenosil wrote:
is not the same. Large sequential operations are much faster
per byte than successive ordered reads and successive ordered
read are a great deal faster than random access. The index
retrieval scheme through an intermediate bitmap guarentees
that records are fetched as close to successive ordered reads
possible. Sort runs are written as single writes 64KB (or
whatever -- should be a MB). The cost of writing the sort
run is probably less than reading two random pages. Second,
if the sort buffer doesn't overflow, nothing gets written at
all. Also keep in mind that sort doesn't copy the engine
record, only fields referenced.
Jim Starkey
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com
First let me say that I believe that IB/Firebird needs to support the
TOP {x} syntax and your top 10/100 example is right on point and is a
real shortcoming for IB/Firebird, but...
With respect to your point about interactive/web applications. I
personally have a problem with people who expect the DB server to be
responsible for the navigation management you refer to. This should be
made the responsibility of a middle-tier solution (it doesn't need to be
complex...) to manage the result list and return the data as
appropriate to the web server. There are numerous benefits to this
approach, one of them is that the query is only performed once and the
result persisted (in a Delphi CDS if you want) - no server
overhead/action is then required to navigate the list.
To expect the server to perform the navigation you want could require
the query to be re-executed for each navigation - this promises to be an
extremely 'expensive' solution and would be a sure way to bring a server
to it's knees, not to talk about putting a Client session to "sleep"
Also, let's not forget that the only way to determine the size of a
result set is to navigate to the end of the set.
I realize that mySQL does provide some of the functionality you are
referring to, but then again I don't think that it comes a mile close to
being a real SQL server product.
Sean
-----Original Message-----
From: David Berg [mailto:DaveBerg@...]
Sent: Tuesday, December 19, 2000 9:57 PM
To: IB-Architect@egroups.com
Cc: 'Swor, Jim'
Subject: RE: [IB-Architect] 'Order By' Issue
RE: Why partial result sets are important.
(1) Interactive applications, including web based applications. If the
user
accidentilly requests 1,000 records you only show him the first 20. If
he
really wants the entire 1,000, we can go back for the rest. If he
requests
1,000,000 then we can be pretty sure he'll never want the entire list,
but
he may still want to look at the top 20 (or 100), particularly if the
data
is sorted so that the most meaningful items are at the top.
If I have to extract and sort the 1,000,000 that's a lot slower than
using
an index to retrieve the first 100 and then stopping.
Note that in this case, we don't always know how many records the user's
going to want to see, we just know that it's not going to be a lot.
However, since we don't generally like keeping open cursors in this
multi-tier / stateless world, it would be acceptible to let us say TOP
20,
and have that cue the optimizer to use the index to retrieve the records
(because it's going to be a lot faster if we can determine the TOP 20
records first, and then pull the rest of the data).
We're doing work with Delphi's ClientDataSets right now, to try to
manage
retrieving records in packets, because we have real problems when people
request too many records without realizing what they're asking for.
We try to clue them into the problem, but realistically, it's hard -
because
we really don't know how big the result sets going to be. We could
execute
every query twice - first with a COUNT(*), and then again going after
the
result set. That way if the count was too great we could prompt the
user,
but that's (nearly) doubling the time it would take to open every form
(I
realize that a COUNT(*) query generally executes much faster than just
getting the data, the performance hit would be worse on small data sets,
or
on complex queries where just figuring out what's in the result set
takes a
long time).
(2) Analytic applications, where we want to look at our top 10
customers, or
top 100 selling products.
-----Original Message-----
From: Jim Starkey [mailto:jas@...]
Sent: Wednesday, December 06, 2000 10:35 AM
To: IB-Architect@egroups.com
Subject: Re: [IB-Architect] 'Order By' Issue
At 05:49 PM 12/6/00 +0100, Ivan Prenosil wrote:
>>There are two problems with this analysis. First, all IO
>> It is generally (i.e. always) faster to make a sequential pass
>> through selected records and sort the results than to bounce
>> between the index and the data pages -- a quicksort, even with
>> a merge, is faster than a page read.
>
>Unfortunately "generally" does not always mean "always".
>
>E.g. you have wide rows and ony one row fit on page
>(perhaps not typical table, but good for this example);
>then ordering by index means that you will have to read
>index (only small amount of data), and then each data page
>only once. On the other hand using sort files means (approximately)
>that each row is read from datapage, written to sort file,
>read from sort file, i.e. 3 times more i/o operations.
>
is not the same. Large sequential operations are much faster
per byte than successive ordered reads and successive ordered
read are a great deal faster than random access. The index
retrieval scheme through an intermediate bitmap guarentees
that records are fetched as close to successive ordered reads
possible. Sort runs are written as single writes 64KB (or
whatever -- should be a MB). The cost of writing the sort
run is probably less than reading two random pages. Second,
if the sort buffer doesn't overflow, nothing gets written at
all. Also keep in mind that sort doesn't copy the engine
record, only fields referenced.
>Retrieving only part of result set is also important,Could you explain this, please?
>especially with internet applications.
>
Jim Starkey
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com
To unsubscribe from this group, send an email to:
IB-Architect-unsubscribe@onelist.com