Subject Why does 'é' (e with cute accent Unicode 0xE9 ) seems to take 3 bytes in Firebird 'UTF8'? (2.1.1)
Author Olivier Mascia
Hello,

This code point 'é' (unicode E9) uses 2 octets in UTF-8.
Is it different with Firebird? (2.x with x >=1)
Would it mean than Firebird UTF8 actually means 3-bytes per char
UNICODE_FSS with ony the logical length taken into account?


Using a WIN1252 connection to a DEFAULT CHARACTER SET UTF8 database:
INSERT INTO MYTABLE(NAME) VALUES('ééééé'); // 5 times letter é

Using a UTF8 connection to that database:
SELECT * FROM MYTABLE;
gets me 15 bytes.
Actually 5 times the following three bytes D4C7DC in hex.


In my book I would have get 5 times C3A9 in hex which is *the* UTF8
representation of E9.
So what is this UTF8 thing in Firebird parlance?

What do I need to do to really input, store and output real UTF8
data? Use CHARSET NONE everywhere and take care of everything at the
application level? Not a big issue for my architecture, but I'd
better know.

Yours,

--
Olivier Mascia