Subject | Re: [Firebird-Architect] Re: The Wolf on Firebird 3 |
---|---|
Author | Olivier Mascia |
Post date | 2005-11-17T17:36:40Z |
Le 17-nov.-05 à 17:46, Jim Starkey a écrit :
forget about it. :)
--
Olivier
> That would require unpacking every string on reference, probablyOkay, I see the context, mixing utf32 and utf8 is a bad idea. Let me
> requiring a string scan to compute expanded length, a memory
> allocation,
> and, eventually, a memory deallocation. In either the current record
> storage format or the new record encoding, a pre-allocated descriptor
> can be set up pointing into the record, obviating the neeed to copy or
> expand.
forget about it. :)
> The following code implements the translation of a properWould need to think a bit more about it, but looks like a nice trick.
> UTF byte sequence:
>
> UCHAR c = *utf8++;
> uint code = utf8Values [c];
> uint length = utf8Lengths [c];
>
> if (length > 1 && (*utf8 & 0xC0) == 0x80)
> for (; length > 1; --length)
> code = (code << 6) | (*utf8++ & 0x3f);
> else
> code = c;
>
> [Note: the code intentionally considers an invalid UTF-8 byte as a
> lost
> 8859-1 character. Whether this is a good or a bad idea, it does allow
> automatic in-place conversion between databases created using ISO
> 8859-1
> into Unicode without rebuilding.]
--
Olivier