Subject Re: [Firebird-Architect] Re: Some benchmarks about 'Order by' - temporary
Author Pavel Cisar
Hi,

Vlad Horsun wrote:
>
> What i mean is that sorted runs contains whole records and during
> merge phase we must read it from disk and write back merged larger
> runs. Formally we need only keys to compare and merge. Therefore
> we have different sort time for records of different size but with keys
> of equal size. More record_size/key_size more performance loss we
> have.

Just an idea.

In the project I was involved in the past that had to sort records of
various size in sets of various size we had a great success with
building B-tree index on the fly + read in index order instead sorting
the set. Sort performance degrades rapidly with set size (size*row
length in fact), while time requirements to build the index was more or
less linear with set size. Index was a balanced B-tree organized in
pages of fixed size. Index key values weren't stored in index directly,
but as pointers to the row in base set. Index was hence very dense and
could be held in memory even for large sets. Of course, once the set
pages / clusters has to be flushed on disk, there was a penalty in index
building when set page has to be read to retrieve the key value, but
anyway, it was quite fast. Lightning fast when everything fits into
memory (that are bigger and cheaper by every year), and I guess that
this strategy could be enhanced and optimized for better performance
when base set doesn't fit into memory (we didn't spend a lot of time to
try optimize it, as results we had were good enough for our needs).

best regards
Pavel Cisar
IBPhoenix