firebird-architect - Re: [Firebird-Architect] Re: The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author	Alex Peshkov
Post date	2005-11-03T16:19:12Z

Jim Starkey wrote:

>Alex Peshkov wrote:
>
>
>
>>>>
>>>>
>>>>
>>>>
>>>Ascii is the lower 7 bits of Unicode, so all ascii strings are valid
>>>UTF-8 strings. Any UDF expecting and receiving ASCII will work just
>>>fine. And UDF depending solely on string length or string termination
>>>will probably work just fine as well. For most string processing UTF-8
>>>and ascii are interchangeable. The only sticking points is where the
>>>code makes an assumption concerning the number glyphs vs. the number of
>>>bytes, something that doesn't happen often in database functions.
>>>
>>>
>>>
>>>
>>>
>>>
>>I'm afraid it's likely to have problems with cyrillic strings. Now they
>>are stored in single byte format (CP1251). And a lot of UDFs expect
>>single byte characters. In UTF-8 cyrillic characters are 2-byte.
>>
>>
>>
>>
>>
>>
>Could you describe the problems you expect? There maybe be
>straightforward ways to address them.
>
>

I have UDF substr(string, from, count). It does (skip range checks here):
char *ss = (char *)ib_util_malloc(count + 1);
memcpy(ss, &string[from], count);
ss[count] = 0;
return ss;

Obviously it would not work with UTF-8 cyrillic string as expected.
Ann's suggestion to have an interface layer to deliver strings to/from
old UDF's in their desired character sets definitely should help.