Subject | Re: [Firebird-Architect] A Fresh Look at Collations |
---|---|
Author | Jim Starkey |
Post date | 2010-06-22T03:12:05Z |
Knowledge of DUCET is limited to a collation generator. Collations
don't have to be have to be generated by the collation generation, but
it's a lot simpler for a collation designer to specify digressions from
DUCET than to start from the beginning.
There are two reasons for custom collations. One is to handle, er,
idiosyncratic collation sequences. The other is to reduce the size of
generated keys when the range of code points is known. 8859-1, for
example, can't be represented in single byte utf8, but base characters
can be represented in a single byte.
Loadable collations, however, are the big win, eliminated a vast
quantity of mind-deadening code. The trick is the set of additional
rules, each of which must be hard coded. Still, these are exceptions to
big picture, not an alternative universe.
Isn't it funny how something as intrinsically dull as collations can
appear interesting?
Paul Ruizendaal wrote:
don't have to be have to be generated by the collation generation, but
it's a lot simpler for a collation designer to specify digressions from
DUCET than to start from the beginning.
There are two reasons for custom collations. One is to handle, er,
idiosyncratic collation sequences. The other is to reduce the size of
generated keys when the range of code points is known. 8859-1, for
example, can't be represented in single byte utf8, but base characters
can be represented in a single byte.
Loadable collations, however, are the big win, eliminated a vast
quantity of mind-deadening code. The trick is the set of additional
rules, each of which must be hard coded. Still, these are exceptions to
big picture, not an alternative universe.
Isn't it funny how something as intrinsically dull as collations can
appear interesting?
Paul Ruizendaal wrote:
>> There is a Default Unicode Collation Element Table (DUCET) that defines,[Non-text portions of this message have been removed]
>>
>
>
>> well, the default Unicode collation. But it is only the default. A
>> collation designed can implement any rules he or she wanted to.
>>
>
> Am I right in thinking that you are proposing a setup similar to collation
> in MySQL 4.1+ for Nimbus? Perhaps based on the ICU library reference
> implementation?
>
> With reference to your original post, I would personally probably hard
> code the DUCET tables and only make the customizations loadable, as my
> perception is that the base tables are large and that the customizations
> are small. The equality levels are an interesting thought, but perhaps
> better handled via a function (i.e. upcase(str1)==upcase(str2), or
> l1(str1)==l1(str2))?
>
> I'm not deeply into collations, but it seems to me you are on the right
> track.
>
> Paul
>
>
>
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
>