Subject Re: [firebird-support] Re: ORDER BY on large VARCHAR columns
Author Ann W. Harrison
At 04:50 PM 10/27/2004, robert_difalco wrote:

>I don't mind taking a hit on insert. They are inserted much less
>frequently than they are read and they are NEVER updated. What do
>you have in mind?

Creating a floating point column that's used just for sorting that
varchar. When you do an insert, find the next lowest and next highest
and set the sort key as sort key value of the lowest plus half the
difference between the lowest and highest.

The crude implementation sorts the whole table once for every insert -
bad, but better than sorting it 20,000 for every read, which is what
you might be doing now, if there were world enough and time.

Slightly less crude, if you insert batches, would be to insert a
batch into a holding table, sort them, sort the stored values, and
merge the two lists, generating new sort keys as you go.

Slightly less crude would be to develop a histogram of your current
data, extrapolate a reasonable histogram of the future, and allocate
ranges of numbers for ranges of values. Store that information in at
table. Then, on insert, you lookup the subset that the new varchar falls
into, and sort only that subset of the table, selected by sort key
value of course, not by the long varchar.

Others will offer many refinements, each less crude than the others.
But I think you'll be happier with an artificial sort key in the long
run. Fat indexes - indexes with huge keys - aren't all that nice.