Subject
Author Ann W. Harrison
At 02:43 PM 1/2/2001 -0700, Jason Wharton wrote:

From someone else:

> > >TOP {n} would be handy ... even though it would go thru all
> > >records in the result set for ordering purposes, only those
> > >within the 10000 window would be kept in the sort file (other
> > >records would be discarded during the fetching).

From me:

> > That wouldn't work as well as you might think, given
> > the sort algorithm used. Besides, the major cost is
> > retrieval, not sorting.

From Jason Wharton:

>The person is only after a fixed number of records and
>has no interest in any beyond that amount. Certainly
>this could make one algorithm more favorable to justify
>this optimization. Scan the records in natural order
>and use less resources to contain the temporary result
>cache is the suggestion. Sounds like a great idea to me.

If the person wants an arbitrary 10K records that have
been sorted, then there's a win here. If we're talking
about the highest or lowest valued 10k records out of
100M or more, then maybe an index walk is faster than
a sort.

I am anything but an academic computer scientist and
dragging out Knuth on Sorting and Searching has given
me a headache. My feeble recollection is that a series
of approximate sorts is n log n, while a straight insert
sort is closer to n * n.

There's probably something essential I've missed, but
identifying the highest (or lowest) 10K out of 100M
pretty much involves sorting the whole 100M - depending
on the order of input. You read in the first 10K, then
look at the next row - if it's lower than your high
value, throw out the low value and stick this one in.
Then find the new low value and repeat.

> > >If ordering can be done *without* a temporary result set,
> > >e.g. via index navigation, you would get results very
> > >quickly (minimal processing before returning a record).
> > >Can the database engine currently handle this?
> >
> > Yes, and for large result sets it is often SLOWER to
> > finish than a sort. The first row comes faster, but
> > the total time is LONGER. Really and truly. Tested
> > and verified.
>
>Right, we aren't questioning that. We want the first row(s)
>as fast as possible.

Yes, I've understood that. I believe Jim has said that
there is (was) code in the optimizer that recognized the
blr_first qualifier and used index navigation.
...

>Leave things as they are now, just give
>us the option to tell the server to act the way we want in our circumstances
>that warrant it.

Fine. I think we're in violent agreement where navigational
indexes are concerned.

>PS. I sure would like to hear back from you concerning the compound
>descending order by optimization issues...

There's an 85% probability that the bug is in (or indicated by)
these two lines from opt.c, routine gen_nav.

|| (ptr [sort->nod_count] && !(idx->idx_flags & idx_descending)) ||
(!ptr [sort->nod_count] && (idx->idx_flags & idx_descending)))

The sort tail is composed of pairs of value node pointers and true/false
flags indicated whether the column is ascending or descending. I think
the referencing here is wrong, but more likely (65%) I'm wrong and the
problem is slightly different.

Regards,

Ann
www.ibphoenix.com
We have answers.

Regards,

Ann
www.ibphoenix.com
We have answers.