Subject Re: External tables with ISO8859-1 or UTF characters [SOLVED]
Author goddert
Hello,

I solved the problems.

[ISO8859-1]
With character set ISO8859-1, for a reason I still don't know, my encoder of the files to be read as external tables in case of a character >0x7f eliminated one byte of the string so the padding with spaces wasn't correct anymore. Shame .. :-(

[UTF-8]
For the encoding of files to be used as external tables is valid:

A field of n characters must be encoded with maximal_bytes_of_char * field length bytes.
Example definition of an UTF8 field in an external table:
"index" CHAR(5) CHARACTER SET UTF8

UTF8 uses maximal 4 bytes for a character.
Therefore this field has to be represented by 4 * 5 bytes in the file.
If the encoding of your string doesn't fill these 20 bytes you have to pad it on the RIGHT with spaces. So also, e.g., a right formated number (' 45') occupies the left side of the 4 * 5 bytes (in UTF8 encoded characters of numbers use only 1 byte). The rest has to be padded with spaces on the RIGHT.

Hopefully I explained it in a comprehensible way.

Thanks

--- In firebird-support@yahoogroups.com, Goddert C <goddert@...> wrote:
>
> Hello,
> I have some problems reading an external table containing
> characters in ISO8859-1 into a db (db charset is UNICODE_FSS)
>
> The table
>
> CREATE TABLE MEDIAINFO_VIDEOFILE EXTERNAL FILE 'video.dat'
> ("index" CHAR(5) CHARACTER SET ISO8859_1, "sep1" CHAR(1) CHARACTER SET
> ISO8859_1,
> "path" CHAR(128) CHARACTER SET ISO8859_1, "sep2" CHAR(1) CHARACTER SET
> ISO8859_1,
> "size" CHAR(15) CHARACTER SET ISO8859_1, "sep3" CHAR(1) CHARACTER SET ISO8859_1,
> "origlang" CHAR(127) CHARACTER SET ISO8859_1, "sep4" CHAR(1) CHARACTER
> SET ISO8859_1,
> "length" CHAR(15) CHARACTER SET ISO8859_1, "sep5" CHAR(1) CHARACTER
> SET ISO8859_1,
> "videocodec" CHAR(127) CHARACTER SET ISO8859_1, "sep6" CHAR(1)
> CHARACTER SET ISO8859_1,
> "framerate" CHAR(7) CHARACTER SET ISO8859_1, "sep7" CHAR(1) CHARACTER
> SET ISO8859_1,
> "videobitrate" CHAR(10) CHARACTER SET ISO8859_1, "sep8" CHAR(1)
> CHARACTER SET ISO8859_1,
> "aspect" CHAR(32) CHARACTER SET ISO8859_1, "sep9" CHAR(1) CHARACTER
> SET ISO8859_1,
> "resolution" CHAR(16) CHARACTER SET ISO8859_1, "newline" CHAR(2)
> CHARACTER SET ISO8859_1);
>
> works fine until I try to read a line with a character whose hexcode
> is greater than 0x7f (I suppose, I'm not sure which is the limit)
> E.g.., if I try to read a line with the character 0xE9 (é) the db
> doesn't read the line with those chars above 0x7f (no select result).
> At the same point if I do an insert into the external table with the
> same values (in particular with those "special" characters the db
> writes this same line which I tried to read. That is to say the
> "special" char is coded with 0xE9. This is an 1-byte charset so I
> didn't expect any special mapping.
>
> Where is my fault? Could it be a problem of the db charset UNICODE_FSS
> (which unfortunately I cannot change)?
>
> In the long term I would like to change to UTF8 but also here I have
> some problems.
> Defining the fields of the table above with character set UTF8 the db
> doesn't read anything. Keeping instead the charset ISO8859-1 for the
> fields above it reads lines with UTF8 chars (like 0xC3A9 - é) but the
> mapping of those chars obviously(?) is wrong.
> A file UTF8 encoded for an external table must have a BOM signature or
> not? It didn't make any difference for me.
>
> I'm grateful for any help.
> Thanks
>
> Windows XPSP3, Firebird 2.5
>