Subject Re: [firebird-support] Re: ORDER BY on large VARCHAR columns
Author unordained
There's also the case where the new row is greater than the greatest value or lesser than the least
value -- you'd probably want to make sure you leave "plenty" of room between the existing range and
the new value, for future inserts. There's the math / precision problem: how quickly will the
floating point's precision catch up with someone using this method, what would ideal values be in
the long run?

Would it be acceptable to use a system in which "0.00" and "1.00" are the absolute possible range,
and all items will be somewhere in that range? (If greater than greatest, use average of (greatest
and 1.00), if lesser than least, use average of (least and 0.00) -- first item is assigned value of
0.5, etc.?)

(Would it be possible for firebird to someday have an index system that doesn't store compressed
data in the index, but only row order/pointer/whatever? It wouldn't be as efficient, but it would
be more flexible, supporting any datatype that can be sorted.)

The already proposed solution (index just part of the string) could be used during inserts, to make
that phase (assigning a floating-point value) faster, no? Hmmm, finding next-greatest seems okay,
finding next-lesser could be messy. (What with sorting truncated strings.)

[Sorry I forgot that multi-column indices eat more space. I've only ever used them against
integers.]

-Philip

---------- Original Message -----------
From: "Ann W. Harrison" <aharrison@...>
To: firebird-support@yahoogroups.com, firebird-support@yahoogroups.com
Sent: Wed, 27 Oct 2004 17:25:04 -0400
Subject: Re: [firebird-support] Re: ORDER BY on large VARCHAR columns

> At 04:50 PM 10/27/2004, robert_difalco wrote:
>
> >I don't mind taking a hit on insert. They are inserted much less
> >frequently than they are read and they are NEVER updated. What do
> >you have in mind?
>
> Creating a floating point column that's used just for sorting that
> varchar. When you do an insert, find the next lowest and next highest
> and set the sort key as sort key value of the lowest plus half the
> difference between the lowest and highest.
>
> The crude implementation sorts the whole table once for every insert -
> bad, but better than sorting it 20,000 for every read, which is what
> you might be doing now, if there were world enough and time.
>
> Slightly less crude, if you insert batches, would be to insert a
> batch into a holding table, sort them, sort the stored values, and
> merge the two lists, generating new sort keys as you go.
>
> Slightly less crude would be to develop a histogram of your current
> data, extrapolate a reasonable histogram of the future, and allocate
> ranges of numbers for ranges of values. Store that information in at
> table. Then, on insert, you lookup the subset that the new varchar falls
> into, and sort only that subset of the table, selected by sort key
> value of course, not by the long varchar.
>
> Others will offer many refinements, each less crude than the others.
> But I think you'll be happier with an artificial sort key in the long
> run. Fat indexes - indexes with huge keys - aren't all that nice.
>
> Regards,
>
> Ann
>
> ------------------------ Yahoo! Groups Sponsor --------------------~-->
> $9.95 domain names from Yahoo!. Register anything.
> http://us.click.yahoo.com/J8kdrA/y20IAA/yQLSAA/67folB/TM
> --------------------------------------------------------------------~->
>
> Yahoo! Groups Links
>
>
>
------- End of Original Message -------