Subject | Re: [ib-support] Problem with field length |
---|---|
Author | Claudio Valderrama C. |
Post date | 2001-12-12T04:35:47Z |
<Herbert.Augustiny@...> wrote in message
news:OFC4BE2C7F.A3C66886-ONC1256B1F.00361D74@......
unicode string fields by default. Is this what you intended, right? If you
want your table to be plain 10 bytes, you have to override the db-charset
using
text CHAR(10) character set NONE;
I will proceed thinking that you want unicode always. Your underlying field
is of length 30 bytes. See LangRef almost at the tail, where the size of
each charset is given (in bytes per logical character). You need to connect
to your db using UNICODE_FSS. I don't know how you do that from Java, sorry.
If you don't specify the charset at connection time, it happens in charset
NONE. At least this work if you start the command-line isql and type:
set names unicode_fss;
connect 'db:/dbs/mycharset.gdb';
insert into ...
If you don't use the charset at connection time, the field will get one-byte
characters, then you may enter up to 30 ASCII characters. You're treating
the unicode field as a raw ascii field of length 30. (If not, ask Helen when
she was storing information for the IBDI Book.) Trying to read this back
will cause errors as those characters not necessarily conform to the UTF
specifications for unicode encoding. From unicode_fss POV, you may have
stored trash. There's an internal table in FB that checks that characters
for a given charset are in the valid range. Unicode may use one, two or
three bytes, depending on the specific character you're storing. It's not an
always-two-bytes unicode like wchar_t in C and C++, for example.
The simple rule is:
create your db in charset C.
Connect with charset C to insert data.
Connect with charset C to retrieve and modify data.
Beware that exceptions' text is stored in charset NONE due to an
implementation handicap.
WRT my change commented by Helen, it only affects metadata definitions in
the engine and metadata extraction in isql. And I added new functionality
before RC2 was tagged because Ann, Paul B and Martijn insisted that when you
change a string field, the old charset should be preserved. (Will have to
check if the old collation is being wiped out in the future.)
C.
--
Claudio Valderrama C. - http://www.cvalde.com - http://www.firebirdSql.org
Independent developer
Owner of the Interbase® WebRing
news:OFC4BE2C7F.A3C66886-ONC1256B1F.00361D74@......
> HiSurprisingly
>
> I have a table where I declare a char(10) field and then in an insert
> statement I insert a string which is to long (maybe 30 chars).
> this works fine. Also when I do a select in isql on this table I amgetting
> the whole string and not just the first 10 chars. But if I do a selectThat's the issue. Since your db-wide charset is unicode, your table uses
> using interclient I am getting an exception:
>
> SQL error code = -303
> arithmetic exception, numeric overflow, or string truncation
>
> Here is the script I am using to reproduce this behavior:
>
> CREATE DATABASE 'db:/dbs/error.gdb'
> DEFAULT CHARACTER SET UNICODE_FSS;
>
> create table test(
> text CHAR(10)
> );
unicode string fields by default. Is this what you intended, right? If you
want your table to be plain 10 bytes, you have to override the db-charset
using
text CHAR(10) character set NONE;
I will proceed thinking that you want unicode always. Your underlying field
is of length 30 bytes. See LangRef almost at the tail, where the size of
each charset is given (in bytes per logical character). You need to connect
to your db using UNICODE_FSS. I don't know how you do that from Java, sorry.
If you don't specify the charset at connection time, it happens in charset
NONE. At least this work if you start the command-line isql and type:
set names unicode_fss;
connect 'db:/dbs/mycharset.gdb';
insert into ...
If you don't use the charset at connection time, the field will get one-byte
characters, then you may enter up to 30 ASCII characters. You're treating
the unicode field as a raw ascii field of length 30. (If not, ask Helen when
she was storing information for the IBDI Book.) Trying to read this back
will cause errors as those characters not necessarily conform to the UTF
specifications for unicode encoding. From unicode_fss POV, you may have
stored trash. There's an internal table in FB that checks that characters
for a given charset are in the valid range. Unicode may use one, two or
three bytes, depending on the specific character you're storing. It's not an
always-two-bytes unicode like wchar_t in C and C++, for example.
The simple rule is:
create your db in charset C.
Connect with charset C to insert data.
Connect with charset C to retrieve and modify data.
Beware that exceptions' text is stored in charset NONE due to an
implementation handicap.
WRT my change commented by Helen, it only affects metadata definitions in
the engine and metadata extraction in isql. And I added new functionality
before RC2 was tagged because Ann, Paul B and Martijn insisted that when you
change a string field, the old charset should be preserved. (Will have to
check if the old collation is being wiped out in the future.)
C.
--
Claudio Valderrama C. - http://www.cvalde.com - http://www.firebirdSql.org
Independent developer
Owner of the Interbase® WebRing