Subject | Re: UTF-8 vs UTF-16 |
---|---|
Author | mailmur |
Post date | 2003-08-22T10:11:36Z |
> * Java encoding "UTF-16" produces UTF-16BESorry I did not post whether those testings produced a BOM marker at
> * Java encoding "Unicode" produces UTF-16LE
> * Win2k Notepads "Unicode" save produces UTF-16LE. It has an
> additional "Unicode big endian" saveAs format for UTF-16BE
the start of file.
Additional testings:
dotNET csharp:
StreamWriter writer = new StreamWriter("data.txt", false,
System.Text.Encoding.Unicode|UTF8|BigEndianUnicode, 8 * 1024);
* System.Text.Encoding.Unicode = UTF-16LE with BOM
* System.Text.Encoding.BigEndianUnicode = UTF-16BE with BOM
* System.Text.Encoding.UTF8 = UTF-8 with BOM
I think Java and dotNET languages will produces same endian on all
operating systems for each charset name specified. So Java "Unicode"
is always UTF-16LE. dotNet documentation will say explicitly
that "Unicode" is a little-endian byte order.
I wish we could have an alias names for commonly used Java/csharp
encoding names (UTF-8, Unicode) to make life easier from client side.
And then map "Unicode" encoding to UTF-16LE byte order platform
independently.
dotNETProvider driver uses "charset=" connectionstring name. Maybe
Jaybird coders could add it as an alias to current "lc_ctype" name.
It is more understandable url parameter name.