Subject Re: UTF-8 (various)
Author Aleksey Karyakin
"Jim Starkey" <jas@...> wrote in message
> Any index can be used for equality retrievals, which probably
> 99% of all index retrievals. Only index range retrievals require
> the collation of the index match the collation of session/query.
If all
> indexes are based on UTF-8, the character set used by the client
> come into the equation at all.

I'm afraid no. What about case-insensitive, accent-insensitive, etc
comparisons which are all language-specific?

There actually are case-insensitive collations in current codebase
but I would prefer the other way to handle this. Let a collation
defines multi-level key strings to use in indexes and comparisons and
an operation itself specifies if it wish to skip any of secondary
weights. So CASE-INSENSITIVE, etc would be an attribute of
index/operation not of a collation. Thus we end up with significatnt
less number of collations preserving vide range of comparison options.

Also, it would be useful (from app devel view) to have a collation
that is case-insensitive in comparisons but case-sensitive in
ordering. For example, 'A' = 'a' but 'A' is sorted always before 'a',
not in random order. Not sure if there is a legal way to handle this.
Maybe partial key matches so that a single case-sensitive index may
still be used to handle case-insensitive search?

Also, why do you think index range retrieval differs from equality
case? The clause WHERE x >= 'a' and x <= 'b' would include 'A'
and 'B' for case-insensitive operation.

Aleksey Karyakin