Subject | RE: [firebird-support] Storing Delphi 2009 "UnicodeString" into database, UTF8? |
---|---|
Author | Svend Meyland Nicolaisen |
Post date | 2009-05-19T14:28:45Z |
>UnicodeString is UTF-16, viz., 16-bytes squashed intoUTF-16 encodes unicode in 16-bit/2 byte characters when encoding unicode
>2-byte munchkins, known as "surrogate pairs". It's
>surrogate pairs you're getting into those 2-byte
>widechars. It is not UTF-8. D2009 has an AnsiString
>variant called UTF8String...but I have no idea how
>it maps to strings that Firebird could transliterate
>to UTF8.
code points below 0x10000. Unicode code points above 0xffff are encoded as
surrogate pairs. A surrogate pair consists of one 16-bit value in the range
0xdc00 0xdfff and one 16-bit value in the range 0xd800 - 0xdbff. These two
ranges are reserved for surrogate pairs in the unicode standard.
Delphi 2009 actually converts UTF-16 encoded unicode strings to UTF-8
encoding when you assign a UnicodeString to an UTF8String. Eg.:
var
US: UnicodeString;
UTF8: UTF8String;
begin
US:='ÆØÅæøå'; //UTF-16 encoded string.
UTF8:=US; //Transliteration from UTF-16 to UTF-8.
end;
You can copy the contents of the UTF8 string to a buffer or what ever you
want to do. For example:
var
US: UnicodeString;
UTF8: UTF8String;
S: AnsiString;
begin
US:='ÆØÅæøå'; //UTF-16 encoded string.
UTF8:=US; //Transliteration from UTF-16 to UTF-8.
SetLength(S,ByteLength(UTF8));
CopyMemory(@S[1],@UTF8[1],ByteLength(UTF8));
ShowMessage(S);
end;
/Svend