Subject Re: Character Sets
Author Roman Rokytskyy <rrokytskyy@acm.org>
> We have encountered a strange problem with our app. Firebird 1.0,
> Red Hat 8.0, Sun JDK 1.3.1 on a duel processor dell. The database
> uses charset none for all the char and text blob fields.
>
> We basically are generating web pages. However, we have started
> to encounter strange behaviour on the above machine. Our app has
> run successfully on Windows 98/2000/XP and Mandrake 8.x. We moved
> it to the Red hat box and have encountered a problem.
>
> If we use the interclient 1.6, if a field in the database
> contains a copyright sign, the output from the web server prefixes
> it with a Â.
>
> if we use the firebird driver the output is truncated at the
> copyright character.
>
> Digging back into the code it seems that it is the rs.getString
> which is causing the problem.
>
> Anyone got any ideas what can be causing this? We've got our app
> running on 5 other linux servers with no problems? Are there any
> character set configuration options in Java?

Hmmm... strange. One of the most common bugs people do is that they
use new String(byte[]) to convert byte arrays into strings. This
creates a string assuming that bytes in array are in _default_ VM
encoding (system property "file.encoding"). On Windows VM defaults
encoding to one specified in regional settings, but on Linux it is
one specified in LC_CTYPE (usually "C").

And with your symptoms I would say "check your application for this
code", if... if JayBird worked correctly. If you set client encoding
correctly in JayBird ("lc_ctype" connection property, value is
character sets available in database), it will correctly convert data
between client and server character sets and return you correct
string. However, if you do not specify this property, VM's default
encoding is used. For InterClient property is called "charSet" and
its value is standard Java encoding name.

You can try:

a) specify default encoding for VM (-Dfile.encoding=Cp1252 for
Western European regional settings) or change LC_CTYPE setting on
your Linux box.

b) specify "lc_ctype" or "charSet" properties and check what you get.

Anyway, I will be very grateful if you can provide a reproducable
test case. Frankly speaking I was sure that JayBird will correctly
handle charsets, but your report means that something might be wrong.

Best regards,
Roman Rokytskyy