Subject Re: [firebird-support] UTF8 in firebird ?
Author Mark Rotteveel
On 6-1-2012 11:07, Vander Clock Stephane wrote:
> of course i was speaking about codepoint ! not (yet) so crazy to
> thing i can put all the symbols in earth in 1 bytes :)
> my index work perfectly, my sorting no (and off course) !
> this why i write this paper about utf8 if not i will stay with
> my ISO88598_1 column and everything will be perfect

Unicode codepoints are not bytes, but the 'abstract' numeric identifiers
for the character in the unicode set. Its value is in the range of 0x00
and 0x10FFFF. Only when you apply an encoding like UTF-8, UTF-16 or
UTF-32 are they translated into actual bytes for storage. As it stands
you require a minimum and maximum of 3 bytes to store unicode codepoints
as is without encoding.

>> If you want simple byte storage and to hell with proper
>> unicode character collation then use character set OCTETS.
>>
> OCTECTS or iso8859_1 it's the same in fact ... still need to go like
> you say in the hell of proper unicode character collation in
> both case :(

For character set OCTETS no transliteration is applied (the bytes that
go in in any connection character set are stored as is), the padding
character is 0x00 instead of 0x20.

--
Mark Rotteveel