Subject Re: [Firebird-Architect] A Fresh Look at Collations
Author Jim Starkey
Paul Ruizendaal wrote:
>> Knowledge of DUCET is limited to a collation generator.
>>
>
> What do you mean by "collation generator"?
>
The collation generator is a standalone program that reads the DUCET
weights file (http://unicode.org/Public/UCA/5.0.0/allkeys.txt) and a
collation description file (xml) and generates a full collation file
(also xml) which, in turn, gets loaded at runtime. During the process
of generating the collation file it compacts weights and computes
various collation-specific parameters for the runtime. It determines,
for example, whether or not the base characters can be specified in a
single byte.
>
>> The other is to reduce the size of
>> generated keys when the range of code points is known. 8859-1, for
>> example, can't be represented in single byte utf8, but base characters
>> can be represented in a single byte.
>>
>
> Sounds like "bit bumming" to me. Why is this optimisation relevant in
> today's world?
>
Speed and memory efficiency -- both laudable goals.
>
>
>> Isn't it funny how something as intrinsically dull as collations can
>> appear interesting?
>>
>
> When I started out in business I had a boss who said that there was no
> such thing as a dull business segment; the dull appearance was because one
> didn't know enough about it. I thought it was BS at the time, but 25 years
> later I tend to agree with him.
>


--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376



[Non-text portions of this message have been removed]