Subject Transliteration of identifiers, query buffers etc?
Author Kjell Rilbe

I've been wondering a bit about how Firebird handles transliteration of
various parts of a query, in particular regarding (quoted) identifiers.

My situation is that I have a database with default charset UTF8 and all
char/varchar columns use this charset. I also always use UTF8 as
connection charset.

I would assume that this means that Firebird expects to receive query
strings encoded in UTF8, including identifiers and string literals that
appear in the query.

At the same time, I know that the identifiers are stored in columns with
charset Unicode_FSS, which as far as understand is identical with UTF8
except 1) it will accept malformed strings and 2) it will allocate a
buffer that's fits 4 x maxlength bytes and will accept any string that
fits in that buffer even if the number of Unicode characters > maxlength.

Are there any other differences between Unicode_FSS and UTF8?
Are all valid UTF8 strings < maxlength identical with the corresponding
Unicode_FSS string?

Also, string literals can be specified to be some other charset than
UTF8 - does this mean that the query buffer sent to the server actually
contains segments with different encodings? Or is the query buffer
always 100% encoded in the connection charest?

I tried this with UTF8 connection charset:
select _win1252 'asdfö' "Test"
from rdb$database

It returns this:

So, it seems the string literal is encoded in UTF8 and sent that way to
the server, which interprets it as encoded in WIN1252. So, it seems the
buffer itself is 100% UTF8. Right?

What about identifiers? Assume I have an identifier "Åäöü€ÉÈÏÿñ". Is
there any instance that Firebird could get into trouble with this
assuming I always quote it and always use UTF8 connection charset?