Subject RE: [Firebird-Architect] More bulk loader - sorted data
Author Claudio Valderrama C.
> -----Original Message-----
> From: Firebird-Architect@yahoogroups.com
> [mailto:Firebird-Architect@yahoogroups.com]On Behalf Of Bill Oliver
> Sent: Sábado, 10 de Noviembre de 2007 11:33
>
> Hi all,
>
> I often see cases where data to be bulk loaded is pre-sorted. After
> the bulk load an index will be created - either a primary key or maybe
> a unique constraint.

Pre-sorted still doesn't rule out duplication. :-)


> Would it be helpful to tell the bulk loader that the data to be loaded
> is pre-sorted by some column?

When it's integer type, easy. When it's date-time, if it's ordered as
date-time (instead of string), reasonable. When it's to be put in a blob, we
would rely 100% on the user (we don't sort by blob contents).
But what about string fields? For ASCII it's easy, but for other cases,
would the user-supplied data match our collations?


> Ideally, this information could be leveraged to make creation of index
> quicker.

Sure.

> I had a case recently where 30,000,000 rows were inserted into Vulcan
> table, and creating the PK took 30 minutes. The data for the PK column
> was already sorted and unique.

I seem to remember Quicksort performs badly when data is already sorted, but
our quick() routine contains a cheap trick to avoid this.

> Don't know if this information would be useful or not. Opinions?

Yes, provided that the user knows what he/she is really doing.

C.