Subject | External tables with ISO8859-1 or UTF characters |
---|---|
Author | Goddert C |
Post date | 2012-03-27T17:55:27Z |
Hello,
I have some problems reading an external table containing
characters in ISO8859-1 into a db (db charset is UNICODE_FSS)
The table
CREATE TABLE MEDIAINFO_VIDEOFILE EXTERNAL FILE 'video.dat'
("index" CHAR(5) CHARACTER SET ISO8859_1, "sep1" CHAR(1) CHARACTER SET
ISO8859_1,
"path" CHAR(128) CHARACTER SET ISO8859_1, "sep2" CHAR(1) CHARACTER SET
ISO8859_1,
"size" CHAR(15) CHARACTER SET ISO8859_1, "sep3" CHAR(1) CHARACTER SET ISO8859_1,
"origlang" CHAR(127) CHARACTER SET ISO8859_1, "sep4" CHAR(1) CHARACTER
SET ISO8859_1,
"length" CHAR(15) CHARACTER SET ISO8859_1, "sep5" CHAR(1) CHARACTER
SET ISO8859_1,
"videocodec" CHAR(127) CHARACTER SET ISO8859_1, "sep6" CHAR(1)
CHARACTER SET ISO8859_1,
"framerate" CHAR(7) CHARACTER SET ISO8859_1, "sep7" CHAR(1) CHARACTER
SET ISO8859_1,
"videobitrate" CHAR(10) CHARACTER SET ISO8859_1, "sep8" CHAR(1)
CHARACTER SET ISO8859_1,
"aspect" CHAR(32) CHARACTER SET ISO8859_1, "sep9" CHAR(1) CHARACTER
SET ISO8859_1,
"resolution" CHAR(16) CHARACTER SET ISO8859_1, "newline" CHAR(2)
CHARACTER SET ISO8859_1);
works fine until I try to read a line with a character whose hexcode
is greater than 0x7f (I suppose, I'm not sure which is the limit)
E.g.., if I try to read a line with the character 0xE9 (é) the db
doesn't read the line with those chars above 0x7f (no select result).
At the same point if I do an insert into the external table with the
same values (in particular with those "special" characters the db
writes this same line which I tried to read. That is to say the
"special" char is coded with 0xE9. This is an 1-byte charset so I
didn't expect any special mapping.
Where is my fault? Could it be a problem of the db charset UNICODE_FSS
(which unfortunately I cannot change)?
In the long term I would like to change to UTF8 but also here I have
some problems.
Defining the fields of the table above with character set UTF8 the db
doesn't read anything. Keeping instead the charset ISO8859-1 for the
fields above it reads lines with UTF8 chars (like 0xC3A9 - é) but the
mapping of those chars obviously(?) is wrong.
A file UTF8 encoded for an external table must have a BOM signature or
not? It didn't make any difference for me.
I'm grateful for any help.
Thanks
Windows XPSP3, Firebird 2.5
I have some problems reading an external table containing
characters in ISO8859-1 into a db (db charset is UNICODE_FSS)
The table
CREATE TABLE MEDIAINFO_VIDEOFILE EXTERNAL FILE 'video.dat'
("index" CHAR(5) CHARACTER SET ISO8859_1, "sep1" CHAR(1) CHARACTER SET
ISO8859_1,
"path" CHAR(128) CHARACTER SET ISO8859_1, "sep2" CHAR(1) CHARACTER SET
ISO8859_1,
"size" CHAR(15) CHARACTER SET ISO8859_1, "sep3" CHAR(1) CHARACTER SET ISO8859_1,
"origlang" CHAR(127) CHARACTER SET ISO8859_1, "sep4" CHAR(1) CHARACTER
SET ISO8859_1,
"length" CHAR(15) CHARACTER SET ISO8859_1, "sep5" CHAR(1) CHARACTER
SET ISO8859_1,
"videocodec" CHAR(127) CHARACTER SET ISO8859_1, "sep6" CHAR(1)
CHARACTER SET ISO8859_1,
"framerate" CHAR(7) CHARACTER SET ISO8859_1, "sep7" CHAR(1) CHARACTER
SET ISO8859_1,
"videobitrate" CHAR(10) CHARACTER SET ISO8859_1, "sep8" CHAR(1)
CHARACTER SET ISO8859_1,
"aspect" CHAR(32) CHARACTER SET ISO8859_1, "sep9" CHAR(1) CHARACTER
SET ISO8859_1,
"resolution" CHAR(16) CHARACTER SET ISO8859_1, "newline" CHAR(2)
CHARACTER SET ISO8859_1);
works fine until I try to read a line with a character whose hexcode
is greater than 0x7f (I suppose, I'm not sure which is the limit)
E.g.., if I try to read a line with the character 0xE9 (é) the db
doesn't read the line with those chars above 0x7f (no select result).
At the same point if I do an insert into the external table with the
same values (in particular with those "special" characters the db
writes this same line which I tried to read. That is to say the
"special" char is coded with 0xE9. This is an 1-byte charset so I
didn't expect any special mapping.
Where is my fault? Could it be a problem of the db charset UNICODE_FSS
(which unfortunately I cannot change)?
In the long term I would like to change to UTF8 but also here I have
some problems.
Defining the fields of the table above with character set UTF8 the db
doesn't read anything. Keeping instead the charset ISO8859-1 for the
fields above it reads lines with UTF8 chars (like 0xC3A9 - é) but the
mapping of those chars obviously(?) is wrong.
A file UTF8 encoded for an external table must have a BOM signature or
not? It didn't make any difference for me.
I'm grateful for any help.
Thanks
Windows XPSP3, Firebird 2.5