firebird-architect - Re: [Firebird-Architect] Re: The Wolf on Firebird 3

Subject	Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author	Olivier Mascia
Post date	2005-11-17T17:36:40Z

Le 17-nov.-05 à 17:46, Jim Starkey a écrit :

> That would require unpacking every string on reference, probably
> requiring a string scan to compute expanded length, a memory
> allocation,
> and, eventually, a memory deallocation. In either the current record
> storage format or the new record encoding, a pre-allocated descriptor
> can be set up pointing into the record, obviating the neeed to copy or
> expand.

Okay, I see the context, mixing utf32 and utf8 is a bad idea. Let me
forget about it. :)

> The following code implements the translation of a proper
> UTF byte sequence:
>
> UCHAR c = *utf8++;
> uint code = utf8Values [c];
> uint length = utf8Lengths [c];
>
> if (length > 1 && (*utf8 & 0xC0) == 0x80)
> for (; length > 1; --length)
> code = (code << 6) | (*utf8++ & 0x3f);
> else
> code = c;
>
> [Note: the code intentionally considers an invalid UTF-8 byte as a
> lost
> 8859-1 character. Whether this is a good or a bad idea, it does allow
> automatic in-place conversion between databases created using ISO
> 8859-1
> into Unicode without rebuilding.]

Would need to think a bit more about it, but looks like a nice trick.

--
Olivier