firebird-architect - Re: UTF-8 vs UTF-16

Subject	Re: UTF-8 vs UTF-16
Author	mailmur
Post date	2003-08-22T10:11:36Z

> * Java encoding "UTF-16" produces UTF-16BE
> * Java encoding "Unicode" produces UTF-16LE
> * Win2k Notepads "Unicode" save produces UTF-16LE. It has an
> additional "Unicode big endian" saveAs format for UTF-16BE

Sorry I did not post whether those testings produced a BOM marker at
the start of file.

Additional testings:

dotNET csharp:
StreamWriter writer = new StreamWriter("data.txt", false,
System.Text.Encoding.Unicode|UTF8|BigEndianUnicode, 8 * 1024);

* System.Text.Encoding.Unicode = UTF-16LE with BOM
* System.Text.Encoding.BigEndianUnicode = UTF-16BE with BOM
* System.Text.Encoding.UTF8 = UTF-8 with BOM

I think Java and dotNET languages will produces same endian on all
operating systems for each charset name specified. So Java "Unicode"
is always UTF-16LE. dotNet documentation will say explicitly
that "Unicode" is a little-endian byte order.

I wish we could have an alias names for commonly used Java/csharp
encoding names (UTF-8, Unicode) to make life easier from client side.
And then map "Unicode" encoding to UTF-16LE byte order platform
independently.

dotNETProvider driver uses "charset=" connectionstring name. Maybe
Jaybird coders could add it as an alias to current "lc_ctype" name.
It is more understandable url parameter name.