Subject | RE: [firebird-support] non us characters in comments fail |
---|---|
Author | Mark Rotteveel |
Post date | 2014-04-25T08:38:15Z |
On Fri, 25 Apr 2014 07:55:35 +0000, Pekka Paunio
<pekka.paunio@...> wrote:
Also how are you executing this script? It is very easy for a tool to use
the connection character set UTF-8, but then send bytes that are - for
example - ISO-8859-1 which then leads to transliteration errors as there
are byte combinations that are invalid UTF-8. An ä in ISO-8859-1 is byte
228 or 1110 0100 meaning the character consists of 3 bytes, this means this
byte must be followed by two bytes with the two highest bits set to 10,
however your ä is followed by a space (32 or 0010 0000) and an asterisk(*,
42 or 0010 1010) and so it is an invalid UTF-8 character!
The same when you connect with connection character set NONE, the client
sends the bytes as is and Firebird then tries to store it in UNICODE_FSS of
the system table, this fails for the same reasons.
Mark
<pekka.paunio@...> wrote:
> Charset is UTF8 and Server version is 2.5.2.Is that your *connection* character set or your *database* character set?
Also how are you executing this script? It is very easy for a tool to use
the connection character set UTF-8, but then send bytes that are - for
example - ISO-8859-1 which then leads to transliteration errors as there
are byte combinations that are invalid UTF-8. An ä in ISO-8859-1 is byte
228 or 1110 0100 meaning the character consists of 3 bytes, this means this
byte must be followed by two bytes with the two highest bits set to 10,
however your ä is followed by a space (32 or 0010 0000) and an asterisk(*,
42 or 0010 1010) and so it is an invalid UTF-8 character!
The same when you connect with connection character set NONE, the client
sends the bytes as is and Firebird then tries to store it in UNICODE_FSS of
the system table, this fails for the same reasons.
Mark