Subject Re: [firebird-support] Writing UTF16 to the database
Author Scott Morgan
Lester Caine wrote:

>OK from discussions elsewhere ...
>UTF-8 can take upto 6 bytes to store data, UTF-16 needs 2 or 4 bytes to
>store the SAME data. The difference being that the bytes used to flag
>the need for an EXTRA byte are rather wasteful in UTF-8.
>The RAW data is only 21/22 bits so CAN be stored in three bytes. UTF-32
>uses four bytes, but the fourth byte is always empty, so internal
>storage as three bytes does not cause any problems.
>The FSS comes from File System Safe and just means that '00' bytes that
>would form part of a UTF-8 sequence are removed so that '00' can be used
>as the final byte of the string.
>
>

UNICODE_FSS _is_ UTF-8, it's just an old name for it. We've been over
this several times.

http://www.unicode.org/glossary/#FSS_UTF

The File System Safe bit of the name means that there isn't an endian
issue with it like the 16 and 32 bit encodings have. It has nothing to
do with embedded 0x00 bytes as the only UTF-8 char encoding with 0x00 in
it is 'NUL'.

Apart from the lack of endian issues the other reason to use UTF-8 (FSS)
is that the API (fbclient.dll) functions only pass/recive C char type
strings, not wide chars (wchar_t or similar). So if you were to pass a
wide char string to the API a) you'd have to somehow tell the API you
were passing a wide char string so that it knew to handle it correctly
and b) you'd have to cast the string pointer from a wchar_t* (or
similar) to a char* which is generally very bad practice.

More likely is that in some future version of FB (3, 4? maybe even
later) they'll add a speciallised wide char version of the FB API, but I
wouldn't hold my breath on that one.

Scott