Subject Re: [firebird-support] UTF8 in firebird ?
Author Lester Caine
Mark Rotteveel wrote:
>> and even you say yourself, in the true of the true standard, utf8 must
>> > be encoded
>> > in up to 6 char even !:)
> That is not what I said. UTF-8 encoding was originally devised to allow
> for encoding 2^31 - 1 characters using variable length encoding of 1 to 6
> bytes (which is afaik the entire range of unicode codepoints), but because
> UTF16 only encodes 2^16-1 characters and uses surrogate pairs for higher
> order codepoints, the decision was made by the standards committee to only
> use UTF-8 encoding upto 4 bytes, so the same range of characters as UTF16
> could be encoded to make coding between UTF16 and UTF8 easier.

Just to correct this ...
Unicode is 2^24-1
6 HEX digits
16 planes from 0x000000 to 0x10FFFF are currently defined.
So all unicode characters can be defined in 3 BYTES.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php