Subject Re: [firebird-support] UTF8 in firebird ?
Author Michael Ludwig
Mark Rotteveel schrieb am 07.01.2012 um 10:20 (+0100):

> As I said: in theory UTF-8 could encode to more than 4 bytes as it was
> designed to do that, but the standard committee decided not to use
> more than 4 bytes. So: the encoding scheme that UTF-8 uses *could* use
> more bytes, but the UTF-8 standard does *not allow* use of more than 4
> bytes.

Isn't it rather that UTF-8 just follows the *Unicode* standard which
doesn't make any provisions for codepoints above 1114111 (0x10FFFF) and
hence doesn't require UTF-8 to use more than four bytes for encoding?

Okay, I took a look at the Unicode 6.0 standard doc, section 3.9, p.93:

UTF-8 encoding form: The Unicode encoding form that assigns each
Unicode scalar value to an unsigned byte sequence of one to four
bytes in length, as specified in Table 3-6 and Table 3-7.

And those tables also show four bytes only.

So it's a max of four bytes per character for UTF-8 as of Unicode 6.0,
period.
--
Michael Ludwig