Subject | Transliteration of identifiers, query buffers etc? |
---|---|
Author | Kjell Rilbe |
Post date | 2014-04-23T09:01:34Z |
Hi,
I've been wondering a bit about how Firebird handles transliteration of
various parts of a query, in particular regarding (quoted) identifiers.
My situation is that I have a database with default charset UTF8 and all
char/varchar columns use this charset. I also always use UTF8 as
connection charset.
I would assume that this means that Firebird expects to receive query
strings encoded in UTF8, including identifiers and string literals that
appear in the query.
At the same time, I know that the identifiers are stored in columns with
charset Unicode_FSS, which as far as understand is identical with UTF8
except 1) it will accept malformed strings and 2) it will allocate a
buffer that's fits 4 x maxlength bytes and will accept any string that
fits in that buffer even if the number of Unicode characters > maxlength.
Are there any other differences between Unicode_FSS and UTF8?
Are all valid UTF8 strings < maxlength identical with the corresponding
Unicode_FSS string?
Also, string literals can be specified to be some other charset than
UTF8 - does this mean that the query buffer sent to the server actually
contains segments with different encodings? Or is the query buffer
always 100% encoded in the connection charest?
I tried this with UTF8 connection charset:
select _win1252 'asdfö' "Test"
from rdb$database
It returns this:
asdfö
So, it seems the string literal is encoded in UTF8 and sent that way to
the server, which interprets it as encoded in WIN1252. So, it seems the
buffer itself is 100% UTF8. Right?
What about identifiers? Assume I have an identifier "Åäöü€ÉÈÏÿñ". Is
there any instance that Firebird could get into trouble with this
assuming I always quote it and always use UTF8 connection charset?
I've been wondering a bit about how Firebird handles transliteration of
various parts of a query, in particular regarding (quoted) identifiers.
My situation is that I have a database with default charset UTF8 and all
char/varchar columns use this charset. I also always use UTF8 as
connection charset.
I would assume that this means that Firebird expects to receive query
strings encoded in UTF8, including identifiers and string literals that
appear in the query.
At the same time, I know that the identifiers are stored in columns with
charset Unicode_FSS, which as far as understand is identical with UTF8
except 1) it will accept malformed strings and 2) it will allocate a
buffer that's fits 4 x maxlength bytes and will accept any string that
fits in that buffer even if the number of Unicode characters > maxlength.
Are there any other differences between Unicode_FSS and UTF8?
Are all valid UTF8 strings < maxlength identical with the corresponding
Unicode_FSS string?
Also, string literals can be specified to be some other charset than
UTF8 - does this mean that the query buffer sent to the server actually
contains segments with different encodings? Or is the query buffer
always 100% encoded in the connection charest?
I tried this with UTF8 connection charset:
select _win1252 'asdfö' "Test"
from rdb$database
It returns this:
asdfö
So, it seems the string literal is encoded in UTF8 and sent that way to
the server, which interprets it as encoded in WIN1252. So, it seems the
buffer itself is 100% UTF8. Right?
What about identifiers? Assume I have an identifier "Åäöü€ÉÈÏÿñ". Is
there any instance that Firebird could get into trouble with this
assuming I always quote it and always use UTF8 connection charset?