Subject | Why does 'é' (e with cute accent Unicode 0xE9 ) seems to take 3 bytes in Firebird 'UTF8'? (2.1.1) |
---|---|
Author | Olivier Mascia |
Post date | 2008-08-06T13:21:34Z |
Hello,
This code point 'é' (unicode E9) uses 2 octets in UTF-8.
Is it different with Firebird? (2.x with x >=1)
Would it mean than Firebird UTF8 actually means 3-bytes per char
UNICODE_FSS with ony the logical length taken into account?
Using a WIN1252 connection to a DEFAULT CHARACTER SET UTF8 database:
INSERT INTO MYTABLE(NAME) VALUES('ééééé'); // 5 times letter é
Using a UTF8 connection to that database:
SELECT * FROM MYTABLE;
gets me 15 bytes.
Actually 5 times the following three bytes D4C7DC in hex.
In my book I would have get 5 times C3A9 in hex which is *the* UTF8
representation of E9.
So what is this UTF8 thing in Firebird parlance?
What do I need to do to really input, store and output real UTF8
data? Use CHARSET NONE everywhere and take care of everything at the
application level? Not a big issue for my architecture, but I'd
better know.
Yours,
--
Olivier Mascia
This code point 'é' (unicode E9) uses 2 octets in UTF-8.
Is it different with Firebird? (2.x with x >=1)
Would it mean than Firebird UTF8 actually means 3-bytes per char
UNICODE_FSS with ony the logical length taken into account?
Using a WIN1252 connection to a DEFAULT CHARACTER SET UTF8 database:
INSERT INTO MYTABLE(NAME) VALUES('ééééé'); // 5 times letter é
Using a UTF8 connection to that database:
SELECT * FROM MYTABLE;
gets me 15 bytes.
Actually 5 times the following three bytes D4C7DC in hex.
In my book I would have get 5 times C3A9 in hex which is *the* UTF8
representation of E9.
So what is this UTF8 thing in Firebird parlance?
What do I need to do to really input, store and output real UTF8
data? Use CHARSET NONE everywhere and take care of everything at the
application level? Not a big issue for my architecture, but I'd
better know.
Yours,
--
Olivier Mascia