firebird-architect - Re: [IB-Architect] 'Order By' Issue

Subject	Re: [IB-Architect] 'Order By' Issue
Author	Corey Wangler
Post date	2001-01-02T04:08:11Z

Hi Jason,

I think there are two separate issues here...

TOP {n} would be handy to limit the amount of resources
(memory, disk) that the database engine uses if it needs
to generate the whole result set for sorting. Say, for
example, the user enters minimal search criteria that
ends up producing a 10 million record result set that
needs sorting.... we could limit the resource usage by
using TOP 10000, and even though it would go thru all
records in the result set for ordering purposes, only those
within the 10000 window would be kept in the sort file (other
records would be discarded during the fetching). Keeping
the resource usage small may allow the sorting to occur in
memory too, so that would be faster than generating a
humungous temporary sort file.... it would be nice if
the implementation of TOP {n} could ensure that!

If ordering can be done *without* a temporary result set,
e.g. via index navigation, you would get results very
quickly (minimal processing before returning a record).
Can the database engine currently handle this? What I'm
thinking here is that if it can use an index for
navigation, couldn't the engine just go to the first
record that suits the criteria, and then return it?
Subsequent fetch(s) would step through the index and
stop at the next valid record and return it... until
reaching the end and returning EOF. Even bettter, the
engine could keep one step ahead... returning a record
and then finding the next straight away so as to
minimize fetch response time.

I'm not sure what the engine currently does in these
areas, but such features could help to write speedy
applications.

FWIW,
Corey.

Corey Wangler
Designed Software Solutions
Adelaide, Australia.

====================================================
Jason Wharton wrote:

>
> . . .
>
> > To use the SELECT TOP {X}, {Y} syntax (as mySQL supports) would in fact
> > require that the database server re-perform the query and then extract
> > the appropriate result set. Thus, each navigation would result in
> > extensive database server activity.
>
> You seem to be missing the point of what is being suggested in general (by
> me at least). I don't want the ability to do just what MySQL does. I don't
> care about the number of records and I certainly don't want to tell it to
> re-execute the exact same query just to get some records further down.
> That's sloppy in my book.
>
> I just want InterBase to use a pathway that is optimized for selecting
> records at the TOP of the dataset. I know this isn't possible for all cases
> but a good database designer can generally maintain statistic tables that
> makes it so that most complex queries can execute very quickly and
> accurately from maintained tables. It's not like I have a prompt asking for
> the user to type their own SQL in the web-browser. I know what needs to be
> delivered and I maintain the tables, etc. to deliver it quickly. It just
> seems that sometimes InterBase fights me because it is so batch minded.
>
> In your case you don't need this because you are just shipping the whole
> result set over to the middle tier so you have no need for quickness of the
> TOP records of the dataset. I think this is where your hang-up is.
>
> . . .
>
> > Finally, if the issue of a huge result set is an issue (your 1 million
> > rows) then I would use the TOP {x} (where x is the maximum size) to
> > govern/limit the size of the result set to an "acceptable" overall size.
>
> Like I said, I'm not talking about TOP x. I'm talking about telling
> InterBase to be quick about getting the TOP records because when the user is
> going to navigate to the next page of items to view I am going to be
> executing a query with a new input parameter value and returning just the
> records of interest. I'm not executing the same query again. In fact, that's
> impossible because one hit to the next is not going to be in the same
> transaction. It cannot be rownum based but instead it has to be based on the
> data itself to be truly accurate. Unless of course you are storing the
> results in a results table and then you are left joining the pages of
> information.
>