firebird-architect - Re: [IB-Architect] Syntax for case insensitive sort

Subject	Re: [IB-Architect] Syntax for case insensitive sort
Author	Jan Mikkelsen
Post date	2000-03-29T22:23:55Z

>From: Jim Starkey <jas@...>
>
>Sigh. I have to say I expected this.
>
>Dave (and others): How many other close relatives does CASE/NOCASE
>have? How many classes of collation classes should be expect?
>How many flavors of case insensitivity are the French likely to
>have?
>
>Another question: Do you know of any cases where a "case neutral"
>representation from equivalent upper and lower cases?
>
>I know you are familiar with the famous Knuth article on collation.
>I don't think we need to collate "1812 Overture" as the letters
>forming the German words to 1812, but leaving room clearly useful
>non-English equivalents is important.
>
>Does the collation driver mechanism currently support case changing?
>If so, can a single pair of entrypoints (upcase, downcase) suffice.
>If the French want separate ACCENT/NOACCENT collating sequence, can
>we reasonable expect them to support two different drivers?
>
>How much trouble are we getting ourselves into?

The database has always had the ability to collate on the output of a UDF,
so "order by ucase(field)" is a valid way to get case insensitive collation.
The issue is efficiency. Shock, horror! :-)

I'd like to be able to define an index on the output of a UDF. Like:
"Create [unique] index on tablename (translate(fieldname))".

That way I can easily define my own collation order without needing involve
the engine. I can index and collate in the order "McArthur", "MacDonald",
"McMurtrie", "Salamander", "St. Etienne", "St. George", "Something", as they
are meant to be sorted for phone books. Who knows what the phonebook order
is in Botswana? If I did, I could write the function and get the ordering.

No new order by syntax is required, and the fundamental behaviour of
operators doesn't need to change. Now, I'd like to be able to create new
datatypes and redefine operator behaviour, but that is a whole separate
issue.

There are other advantages:

create table locations (longitude real, latitude real, ...);
create index on locations (distance_from_depot(longitude, latitude));
select blah from locations where distance_from_depot(longitude, latitude) <
whatever;

(Maybe not a great example, but you get the idea.)

Clearly more intelligence is required in the optimiser. It must look for
indexes on expressions, not just columns in base tables.

This is the approach used in Illustra/Postgres influenced systems like
Informix and Tandem NonStop-SQL.

The other option might be to allow user defined collations, and use the
"collate" clause. A standard UDF interface where the field value is passed
in, and a version that sorts correctly using a binary comparison is
returned. Then you need a way to define indexes using a given collation ...

Regards,

Jan.