Subject Re: [Firebird-Java] By INSERT and UPDATE character set defined needed
Author Rick Fincher
OK, good idea.

Rick

----- Original Message -----

> THNX it solved my problem,.... bur as you asked for input,... maybe you
gan
> use this as example in the FAQ
>
>
> Properties p = new Properties();
> p.put("user","username");
> p.put("password","userpassword");
> p.put("lc_ctype","WIN1251");
> Class.forName("org.firebirdsql.jdbc.FBDriver");
> Connection conn =
>
DriverManager.getConnection("jdbc:firebirdsql:127.0.0.1/3050:C:\\Database\\d
> bname.gdb",p);
>
>
>
> But Again THNX
>
> Richard Drent
>
> ##################################################################
>
> Hi Richard,
>
> The short answer is that you have to open the connection with lc_ctype set
> to the same character set that the table is.
>
> For more detailed info, see below.
>
> I just added the following to the FAQ. It is a compendium of the
character
> set questions. See if this makes sense to you. Feedback from any and all
> would be appreciated.
>
> Rick
> 46- Can you explain how character sets work?
>
> return to top
>
> Character handling with JayBird can be confusing because the Java VM,
> Firebird database, browser, and JayBird Connection all have a charset
> associated with them.
>
> Also, the Firebird server attempts to transliterate the internal charset
of
> a database to the charset specified in the connection. JayBird also
attempts
> to translate the JVM charset to the Firebird server charsets specfied in
the
> connection.
>
> With Firebird the character encoding of the text data stored in the
database
> is set when the database is created. That applies to the char and varchar
> columns, and type 1 blobs (text blobs). You can override the default
charset
> for columns with the appropriate SQL commands when the columns are
created.
> Be careful if you do this or you may end up with two columns in the same
> table that you can't read with the same connection.
>
> The only situation when this problem can happen is when you have a table
> with columns that have the "NONE" character set and some other character
set
> ("UNICODE_FSS", "WIN1252", etc). The server tries to convert characters
from
> the encoding specified for the column into the encoding specified for
> connection. The "NONE" character set allows only one-way conversion: from
> <any> to "NONE". In this case server simply returns you bytes written in
the
> database. So if you have table:
>
> CREATE TABLE charset_table(
> col1 VARCHAR(10) CHARACTER SET WIN1252,
> col2 VARCHAR(10) CHARACTER SET NONE
> )
>
> you will not be able to modify both columns in the same SQL statement, and
> it does not matter whether you use "NONE", "WIN1252" or " UNICODE_FSS" for
> the connection.
>
> The only possible way to solve this problem is to use character set
> "OCTETS". This is some kind of artificial character set, similar to "NONE"
> (data are written and read as byte arrays), however there exist
> bi-directional translation rules between any character set (incl."NONE")
and
> "OCTETS". You can specify "OCTETS" for connection and then decode byte
> arrays you receive from the server yourself, the driver will do
> byte-array-to-string conversion incorrectly, since it does not get a hint
> about the character set from the server.
>
> Let's say your program or web app prompts the user to type a string. If
the
> user types "abc" into a Java text box in your app, or into a text box in a
> browser for a web app, Java creates a string for that 3 character string
in
> Unicode. So the string would actually have 9 bytes in it- three Unicode
> characters of three bytes each.
>
> INSERTing this in a text column in the database would result in nine bytes
> being inserted without translation.
>
> Note: You have to pass correct unicode strings to the driver. What is
> "correct unicode" string? It is easier to explain what is not a correct
> unicode string.
>
> Let's assume you have normal text file in WIN1251 encoding. In this case
> cyrillic characters from the unicode table (values between 0-65535) are
> mapped into the characters with values 0-255. However, your regional
> settings say that you're in Germany. This means that file.encoding will be
> set to Cp1252 on JVM start. If you now open the file and construct a
reader
> without specifying that character encoding is Cp1251, Java will read the
> file and construct your strings. However all cyrillic characters will be
> replaced by characters from the Cp1252 encoding that have the same number
> representation as the cyrillic ones.
>
> These strings are valid unicode strings, however their content is not the
> content you read from the file. Interestingly enough, if you write such
> strings back into the file, and open it in some text editor saying that
this
> is WIN1251 text, you will see correct text.
>
> If you open your connection to the database without specifying a character
> set (lc_ctype) it defaults to NONE and no translation is done. So when you
> SELECT the previously inserted data from the database and display it in
your
> program you get the same string you entered, right? Well, not necessarily.
>
> You will get a string with the same nine bytes in it that were stored, but
> if the user getting that string from the database has a different default
> charset in his Java VM those bytes will display differently.
>
> The JVM usually picks up its locale dependent character encoding from the
> underlying operating system, but it can also be set when you invoke the
JVM
> by using -Dfile.encoding=Cp1252, for example. If you attempt to display
> characters that aren't in your default JVM encoding they apear as '?'.
>
> The only way to insure you always get back what you put in is to create
the
> database with a charset and set lc_ctype to the same charset when you open
> the connection to that database.
>
> If want to use charsets other than NONE and you have lots of data in
> databases with a charset of NONE, you may have to set up a new database
with
> a different charset and use a data pump to transfer data over, or write a
> small program to do the transfer.
>
> Using UNICODE_FSS works well nearly everywhere but may increase the size
of
> your databases if you have lots of text because Unicode uses characters up
> to 3 bytes long.
>
> There are some limitations regarding UNICODE_FSS character set: there's
only
> one collation, where strings are sorted by the natural order, and not
> collation rules for different languages; there are some issues when
> converting them to upper case, etc. More information on these anomalies
can
> be found in the Firebird-Support group.
>
> Again, the default charset for Firebird is NONE. The Firebird server does
no
> translation with this charset.
>
> If your database has a charset of NONE and you set a charset type on the
> connection (lc_ctype) that is not NONE, you can write to the database but
> you can't read from it without getting the "Cannot transliterate between
> character sets" exception.
>
> Let's follow a string as it gets inserted into the database and later
> selected from the database. For this example we will set the database
> charset to NONE.
>
> See the freely available Interbase 6 PDF manuals "Data Definition Guide"
for
> a list of charsets available.
>
> The WIN125X or ISO8859_1 charsets may be a good choice for you if you need
> the non-English characters but want the compactness of the 1 byte
> characters. With these char sets you can specify many different national
> language collation orders in the ORDER BY clause of the SELECT statement.
>
> Let's look at the same example above, but this time we will insert into a
> database that has been created with a charset of WIN1251.
>
> When you open the connection to the database you set the lc_ctype=WIN1251.
> Then insert the string 'abc' into the appropriate column. JayBird has to
> take the Unicode encoded Java String "abc"and convert it to WIN1251 format
> and send it to the database server for insertion. Since the database is
> already in WIN1251 format, the server does not have to translate. When the
> string is read back from the database it is converted back to the Java VM
> format by JayBird.
>
> It is also possible to set an lc_ctype in a connection that is different
> from the charset of the database. This lets the database server do the
> translating from one charset to another. This is a feature of the Firebird
> server that lets programming languages or programs that require specific
> character formats to connect to the database without requiring the data to
> be stored in that format.
>
> You can also avoid problems by using java.sql.PreparedStatement instead of
> java.sql.Statement and not building SQL strings out of concatenated Java
> strings. For example:
>
> String sqlString, firstName="John", lastName="O'Neal";
>
> sqlString = "INSERT INTO nameTable (LNAME, FNAME)
> VALUES('"+lastName+"','"+firstName+"')";
>
> Statement stmt = connection.createStatement();
> int insertedRows = stmt.executeUpdate(sqlString);
>
> The problem here is that if the user types in data for these strings you
> might end up with illegal characters, or the translation might not be
> correct.
>
> In the example above, the following illegal SQL string would be generated
> and cause an exception to be thrown because of the apostrophe in O'Neil:
>
> INSERT INTO nameTable (LNAME, FNAME) VALUES('O'Neal', 'John')
>
> To avoid this, use a prepared statement like in the example below.
>
> PreparedStatement stmt = connection.prepareStatement("INSERT INTO
> nameTable(LNAME,FNAME) VALUES(?, ?)");
>
> stmt.setString(1, lastName);
> stmt.setString(2, firstName);
>
> int insertedRows = stmt.executeUpdate();
>
> if (insertedRows != 1)
> throw new MyInsertFailedException("Could not insert data");
>
>
>
> ----- Original Message -----
> From: "Richard Drent" <richard@...>
> To: <Firebird-Java@yahoogroups.com>
> Sent: Wednesday, January 14, 2004 4:42 PM
> Subject: [Firebird-Java] By INSERT and UPDATE character set defined needed
>
>
> Hi to all,
>
> I need to specify the character set by inserting and updating fields.
>
> If I insert/update a record by IBManager`, I can INSERT/UPDATE èéáà etc
etc.
> if I do this by JSP it isn't working.
>
> So how can I define the character set?
>
> Greetz
>
> Richard Drent
>
>
>
>
>
>
>
> Error msg;
>
>
>
> javax.servlet.ServletException: GDS Exception. 335544321. arithmetic
> exception, numeric overflow, or string truncation
> Cannot transliterate character between character sets
> at
>
org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImp
> l.java:463)
> at
>
org.apache.jsp.upd_0005fcompany$jsp._jspService(upd_0005fcompany$jsp.java:30
> 2)
> at
> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:107)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
> http://groups.yahoo.com/group/Firebird-Java/
>
> To unsubscribe from this group, send an email to:
> Firebird-Java-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>
>
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
> http://groups.yahoo.com/group/Firebird-Java/
>
> To unsubscribe from this group, send an email to:
> Firebird-Java-unsubscribe@yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>
>
>