Subject | Re: [Firebird-Java] Re: lc_ctype support |
---|---|
Author | David Jencks |
Post date | 2002-04-09T12:53:19Z |
I can't say I understand this argument very well or at all, however I'm
finding Roman's side more convincing.
I was under the impressions that all Java strings were unicode, and there
was no choice in this. Therefore I thought the only problem was how to
send unicode strings to firebird, depending on the firebird db and column
char sets. I think problems of properly getting national characters into
firebird from sources such as web pages should not be the firebird driver's
problem. From this point of view I don't really understand why you should
need to specify any client encoding, since I thought it was always unicode.
A new, really simple, example might help me.
In any case, when can I release a binary version of the driver?
Thanks
david jencks
finding Roman's side more convincing.
I was under the impressions that all Java strings were unicode, and there
was no choice in this. Therefore I thought the only problem was how to
send unicode strings to firebird, depending on the firebird db and column
char sets. I think problems of properly getting national characters into
firebird from sources such as web pages should not be the firebird driver's
problem. From this point of view I don't really understand why you should
need to specify any client encoding, since I thought it was always unicode.
A new, really simple, example might help me.
In any case, when can I release a binary version of the driver?
Thanks
david jencks
On 2002.04.09 08:15:51 -0400 rrokytskyy wrote:
> > Yes, I understand this, but you should also understand that we're
> > speaking about TWO encodings. When lc_ctype is WIN1250 I can send
> > 0xF5 to the driver, when lc_ctype is different (e.g. NONE, or
> > WIN1252) I can not. Problem is there is no way to pass 0xF5 *to the
> > JDBC driver*, because it assumes my string is encoded (but this is
> > a different encoding from lc_ctype) using Cp1250.
>
> Why do you need to pass 0xF5 to driver? You have to pass 0x151 that
> is obtained by
>
> unicodeStr = new String(win1250Str.getBytes(), "Cp1250");
>
> where win1250Str contains 0xF5. Driver assumes that your string is
> _unicode_. If during the conversion from unicode to win1251 0xF5 is
> replaced by '?', well... that was a wrong string.
>
> > Java encoding convert Unicode characters to 8 bit characters, but
> > lc_ctype (thus encoding specified for Firebird) *does not make* any
> > conversion. That is an interpretation, not a conversion. When I set
> > lc_ctype to WIN1250, I specify ONLY what characters are allowed in a
> > 8-bit byte array, and how to order the strings with those
> > characters.
> >
> > That is the difference.
>
> Sorry, I do not understand you. Firebird does perform the conversion
> from the encoding of your connection to encoding of the column. In my
> test case for Ukrainian, I do pass string to DBMS in WIN1251 encoding
> and it is correctly converted by the DBMS to UNICODE_FSS.
>
> > Ok, but how can I pass a correct unicode string, if encoding is
> > encoding is different in a column from the connection lc_ctype.
>
> Encoding of what? Unicode string has no encoding, or I am wrong? In
> correct unicode string, all national charactes have codes the
> correspond them in unicode table. If you have 0xF5 in unicode string,
> that is not 0x151 (even if you mean it) and you hardly can expect the
> Unicode <-> national charset conversion work correctly.
>
> Connection encoding is the encoding in what you will be passing the
> data. DBMS takes all the responsibility to store data in the encoding
> specified for column. Driver just needs to provide data in the
> encoding is specified for connection.
>
> > What I want to make clear is that lc_ctype is _not_ an encoding.
> > That is only a method to make UPPER and ORDER BY work, but that is
> > not an encoding.
>
> Where did you read this?
>
> API Guide says (page 47):
>
> "isc_dpb_lc_ctype String specifying the character set to be
> utilized".
>
> Language Reference (page 277):
>
> "A character set defines the symbols that can be entered as text in a
> column, and its also defines the maximum number of bytes of storage
> necessary to represent each symbol.... Each character set also has an
> implicit collation order that specifies how its symbols are sorted
> and ordered."
>
> So, lc_ctype _is_ the character set of the client.
>
> Language Reference (page 285):
>
> "SET NAMES specifies the character set the server should use when
> translating data from the database to the client application.
> Similarly, when the client sends data to the database, the server
> translates the data from the client's character set to the database's
> default character set (or the character set for an individual column
> if it differs from the database's default character set)."
>
> SET NAMES is setting the isc_dpb_lc_ctype for the connection. From
> the citation you can find that server does perform translation from
> charset of the column to the client's charset.
>
> > Encoding converts ASCII characters to Unicode characters and vica
> > versa. If you send 0xF5 to Firebird, it will store 0xF5 in the
> > database.
>
> Not always (you might try defining the column with ASCII charset
> 0..127 and try to write 0xF5). It will accept it only if 0xF5 is
> allowed in the character set of the connection (NONE, UNICODE_FSS,
> WIN1251, etc.). But it will try to convert it according to the
> charset of the database or column and throw an exception if it fails.
> Therefore you cannot store data through the connection with NONE
> charset into WIN1250 column, simply because Firebird has no hint how
> the data you supplied must be converted into WIN1250 charset.
>
> > Maybe, if you don't set lc_ctype it will say that character is not
> > valid, but it won't convert it.
>
> Not specifying lc_ctype in DPB is equal to NONE. If you set NONE as
> the charset, you will be able to read data in WIN1250 columns, but
> not write.
>
> > I think now it is clear that the main problem is I can't pass 0xF5
> > to the JDBC driver, what you should see is that in many cases there
> > is no way to create a "correct" unicode string, because original
> > encoding is not known.
>
> If you do not know the encoding, how can you use them in database?
> Java does not have "unknown" encoding as well, it has "default" one.
> What prevents you to use Cp1250 as the default encoding for your JVM?
> Then you are sure that this is true:
>
> new String(new byte[]{(byte)0xf5}).charAt(0) == 0x151
>
> > Moreover there is absolutely no hope to write data to column with
> > differenc character set with the current system.
>
> Should there be any? In this way you might corrupt data in the
> database. Connect with correct charset (UNICODE_FSS for example),
> provide correct data (UTF8 for example) and Firebird will do the rest.
>
> > +1 vote to add an option to disable character set conversion. :)
>
> -1 vote not to add such option.
>
> But sure, you can add this feature in your driver, like an JVM option
> (please, do not add a constant to GDS.java because this is an API).
>
> Best regards,
> Roman Rokytskyy
>
>
>
> To unsubscribe from this group, send an email to:
> Firebird-Java-unsubscribe@yahoogroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>
>