Subject RE: [firebird-support] Storing Delphi 2009 "UnicodeString" into database, UTF8?
Author Svend Meyland Nicolaisen
>UnicodeString is UTF-16, viz., 16-bytes squashed into
>2-byte munchkins, known as "surrogate pairs". It's
>surrogate pairs you're getting into those 2-byte
>widechars. It is not UTF-8. D2009 has an AnsiString
>variant called UTF8String...but I have no idea how
>it maps to strings that Firebird could transliterate
>to UTF8.

UTF-16 encodes unicode in 16-bit/2 byte characters when encoding unicode
code points below 0x10000. Unicode code points above 0xffff are encoded as
surrogate pairs. A surrogate pair consists of one 16-bit value in the range
0xdc00 – 0xdfff and one 16-bit value in the range 0xd800 - 0xdbff. These two
ranges are reserved for surrogate pairs in the unicode standard.

Delphi 2009 actually converts UTF-16 encoded unicode strings to UTF-8
encoding when you assign a UnicodeString to an UTF8String. Eg.:

var
US: UnicodeString;
UTF8: UTF8String;
begin
US:='ÆØÅæøå'; //UTF-16 encoded string.
UTF8:=US; //Transliteration from UTF-16 to UTF-8.
end;

You can copy the contents of the UTF8 string to a buffer or what ever you
want to do. For example:

var
US: UnicodeString;
UTF8: UTF8String;
S: AnsiString;
begin
US:='ÆØÅæøå'; //UTF-16 encoded string.
UTF8:=US; //Transliteration from UTF-16 to UTF-8.
SetLength(S,ByteLength(UTF8));
CopyMemory(@S[1],@UTF8[1],ByteLength(UTF8));
ShowMessage(S);
end;

/Svend