Subject | Re: [firebird-support] Re: udf won't do what it should |
---|---|
Author | Helen Borrie |
Post date | 2006-06-26T08:24:29Z |
At 05:45 PM 26/06/2006, you wrote:
bytes, though there are no charsets that actually use 5 or 6 bytes).
But UNICODE_FSS is not UTF-8. It's based on a much older set of
Unicode characters and has this built-in gotcha of being fixed to
exactly 3 bytes.
with no cute compression algorithms. At the interface, the engine
even converts one-byte characters, such as 'A' or '7' to three-byte
sequences. That's in contrast to the new UTF-8 support that Adriano
introduced in Firebird 2, which does do the proper morphing of the
7-bit characters.
As to "widestring", the experiments I've done with the TNT components
is that, although they do provide more support for MBCS than the
Borland language environments provide out-of-the-box, they still have
conversion problems with UNICODE_FSS. They accept and store
characters correctly but they fail to display output from UNICODE_FSS
data correctly in a Windows environment. At least that was my
experience about 18 months ago. It could well be that they have
addressed and fixed that bug.
./heLen
>Dimitry Sibiryakov wrote:Dmitry S. wrote
>
> > On 26 Jun 2006 at 17:00, Helen Borrie wrote:
> >
> >>>and is sequence of one-byte characters.
> >>
> >>Typo. UNICODE_FSS is non-variable three-byte characters, i.e. even
> >>the equivalents of the low-byte ASCII characters are 3 bytes.
> >UTF-8 has variable-length characters (I think it is from 1 to 6
> > Really? I always thought that chars in it (like in utf-8) have
> > variable length (up to 3 bytes).
bytes, though there are no charsets that actually use 5 or 6 bytes).
But UNICODE_FSS is not UTF-8. It's based on a much older set of
Unicode characters and has this built-in gotcha of being fixed to
exactly 3 bytes.
>May be I had choosen wrongLester Caine wrote:
> > expression? Something like 'sequence of up to three-bytes characters'
> > would br better... My English is not good, you know...
>MY understanding of UNICODE_FSS was that it is a 'wide' string format,So much is true.
>and that 24bits per character were used, so 3 bytes.
>So when you compare strings you do not have to do anything otherNot in UNICODE_FSS. It is stored exactly in the 3-byte sequences
>than a simple byte by byte comparison. The 'compression' comes when
>the whole record is compressed to store, not the elements of each raw string.
with no cute compression algorithms. At the interface, the engine
even converts one-byte characters, such as 'A' or '7' to three-byte
sequences. That's in contrast to the new UTF-8 support that Adriano
introduced in Firebird 2, which does do the proper morphing of the
7-bit characters.
As to "widestring", the experiments I've done with the TNT components
is that, although they do provide more support for MBCS than the
Borland language environments provide out-of-the-box, they still have
conversion problems with UNICODE_FSS. They accept and store
characters correctly but they fail to display output from UNICODE_FSS
data correctly in a Windows environment. At least that was my
experience about 18 months ago. It could well be that they have
addressed and fixed that bug.
./heLen