Subject Re: [IBO] Re: Fields with collation=PXW_SLOV consume 3 bytes per char?
Author Helen Borrie
At 01:43 PM 14/10/2003 +0000, you wrote:
>Helen,
>
> > Someone can, but it's absolutely off-topic here. Just this once...
>
>I'm really sorry. It won't happen again. Thanks! :)
>
> > You guessed right. The non-binary collations eat a lot of bytes
> > <snip>
>
>Hmmm... This is as I thought then. But what really puzzles me is the
>fact that, assuming PXW_SLOV means Slovenian, this collation *should*
>be binary. There are no >255 characters in our character set. Our
>alphabet only uses like 5 non-english characters (times 2 for
>upper/lower) which I thought were all part of the ANSI.
>
>Their byte numbers are as follows:
>
>Latin Letter S With Caron (upper=138, lower=154)
>Latin Letter Z With Caron (upper=142, lower=158)
>Latin Letter C With Acute (upper=198, lower=230)
>Latin Letter C With Caron (upper=200, lower=232)
>Latin Letter D With Stroke (upper=208, lower=240)
>
>On the operating system level we generaly use either Windows Central
>Europe or ISO-8859-2 charsets, which both contain all these characters.
>With Firebird I'm using WIN1250 charset which is essentially Windows
>Central European, IIRC.
>
>So the problem is, I don't understand why any collation of such
>character sets has to be multibyte in the first place.

I don't think it is multi-byte. But it's the collations themselves that
eat bytes - it's not a question of *which* characters but several levels of
dominance in the orderings, that apply to upper case and lower case
characters separately.

Please take the question to Firebird-devel. The people who make the
collations can explain it better than I. And is IS off-topic for IBO.

Helen