firebird-support - Re: [firebird-support] Re: udf won't do what it should

Subject	Re: [firebird-support] Re: udf won't do what it should
Author	Helen Borrie
Post date	2006-06-26T08:24:29Z

At 05:45 PM 26/06/2006, you wrote:

>Dimitry Sibiryakov wrote:
>
> > On 26 Jun 2006 at 17:00, Helen Borrie wrote:
> >
> >>>and is sequence of one-byte characters.
> >>
> >>Typo. UNICODE_FSS is non-variable three-byte characters, i.e. even
> >>the equivalents of the low-byte ASCII characters are 3 bytes.

Dmitry S. wrote

> >
> > Really? I always thought that chars in it (like in utf-8) have
> > variable length (up to 3 bytes).

UTF-8 has variable-length characters (I think it is from 1 to 6
bytes, though there are no charsets that actually use 5 or 6 bytes).

But UNICODE_FSS is not UTF-8. It's based on a much older set of
Unicode characters and has this built-in gotcha of being fixed to
exactly 3 bytes.

>May be I had choosen wrong
> > expression? Something like 'sequence of up to three-bytes characters'
> > would br better... My English is not good, you know...

Lester Caine wrote:

>MY understanding of UNICODE_FSS was that it is a 'wide' string format,
>and that 24bits per character were used, so 3 bytes.

So much is true.

>So when you compare strings you do not have to do anything other
>than a simple byte by byte comparison. The 'compression' comes when
>the whole record is compressed to store, not the elements of each raw string.

Not in UNICODE_FSS. It is stored exactly in the 3-byte sequences
with no cute compression algorithms. At the interface, the engine
even converts one-byte characters, such as 'A' or '7' to three-byte
sequences. That's in contrast to the new UTF-8 support that Adriano
introduced in Firebird 2, which does do the proper morphing of the
7-bit characters.

As to "widestring", the experiments I've done with the TNT components
is that, although they do provide more support for MBCS than the
Borland language environments provide out-of-the-box, they still have
conversion problems with UNICODE_FSS. They accept and store
characters correctly but they fail to display output from UNICODE_FSS
data correctly in a Windows environment. At least that was my
experience about 18 months ago. It could well be that they have
addressed and fixed that bug.

./heLen