Subject | Re: [firebird-support] Character set questions |
---|---|
Author | Stefan Heymann |
Post date | 2009-11-16T18:39:14Z |
Alan,
scheme:
Character Set A -> Unicode -> Character Set B
So when your table column is ISO8859_1 and your connection character
set is UTF8, it will do
ISO8859_1 -> Unicode -> UTF-8
When A is NONE, it is difficult, if not impossible to do the
transition to Unicode. NONE is an "array of byte" and there is no
assumption about anything beyond US-ASCII. I don't know what Firebird
will do with such characters, you'll have to try and find out.
to US-ASCII (characters $00..$7F or 0..127). So all characters beyond
127 are illegal.
Unicode range, so when your code is able to handle that, there is no
need to every change that to something else, no matter what language
you have to process.
Western European languages like French, Spanish, Italian, German etc.
So when you know that your application will be limited to these
languages, you can use WIN1252 as the character set for the columns
and the client connection character set.
when it was entered. ASCII does not have this character at all.
The Unicode code point is
00BD VULGAR FRACTION ONE HALF
The value $BD is also the same for ISO-8859-1 (Latin-1) and
Windows-1252.
Other character sets like ISO-8859-2 (Latin-2) and
Windows-1250 for Eastern European languages (like Czech and Polish)
don't have this character.
ASCII connection obviously doesn't care about characters > 127 it
doesn't know and passes them on as it gets them. When displayed on an
English language system using Windows-1252, it will render as the 1/2
sign you are expecting. Display it on a Czech PC using Windows-1250
and you will see a DOUBLE ACUTE ACCENT (whatever that is ...).
HTH
Best Regards
Stefan
> A database created in character set NONE:Firebird will transliterate between all kinds of encodings using this
> some columns are defined as something other than NONE e.g. many system table
> columns
> Application connects with NONE
> Now there is a subtle move to unicode afoot.
> What happens if I set the default application connection to UTF8? (i.e. what
> transformations/transliterations will be forced into the system?)
scheme:
Character Set A -> Unicode -> Character Set B
So when your table column is ISO8859_1 and your connection character
set is UTF8, it will do
ISO8859_1 -> Unicode -> UTF-8
When A is NONE, it is difficult, if not impossible to do the
transition to Unicode. NONE is an "array of byte" and there is no
assumption about anything beyond US-ASCII. I don't know what Firebird
will do with such characters, you'll have to try and find out.
> What happens if after some time for whatever reason I change to ASCIIASCII is not a lot better than NONE. Because ASCII is also restricted
> connection? or vice versa?
> (what will happen to the contents of field values?)
to US-ASCII (characters $00..$7F or 0..127). So all characters beyond
127 are illegal.
> Is it OK to go from ASCII to UTF8 but likely issues of you go theI think it's better to go "up". With UTF8 you have the complete
> other way?
Unicode range, so when your code is able to handle that, there is no
need to every change that to something else, no matter what language
you have to process.
> If all this is done in the context of an application written for andEnglish is handled with codepage Windows-1252, which also handles
> used under Australian (US English) with whatever code page WindowsXP
> runs with, will this cause any other issues? or defray possible
> issues?
Western European languages like French, Spanish, Italian, German etc.
So when you know that your application will be limited to these
languages, you can use WIN1252 as the character set for the columns
and the client connection character set.
> Some users like to enter Alt key combinations to achieve 1/2 (half)NONE will store the single byte value that the "half" character had
> characters. Will those characters be transformed if they were done
> during a NONE connection but reviewed under an ASII or UTF8
> connection?
when it was entered. ASCII does not have this character at all.
The Unicode code point is
00BD VULGAR FRACTION ONE HALF
The value $BD is also the same for ISO-8859-1 (Latin-1) and
Windows-1252.
Other character sets like ISO-8859-2 (Latin-2) and
Windows-1250 for Eastern European languages (like Czech and Polish)
don't have this character.
> At the moment I see the correct 1/2 or 1/4 character when seen under ASCIIThis is an accident. The NONE character set stored it as $BD, the
> connections where they have been created under NONE connections.. is this an
> accident or is this the way it should be?
ASCII connection obviously doesn't care about characters > 127 it
doesn't know and passes them on as it gets them. When displayed on an
English language system using Windows-1252, it will render as the 1/2
sign you are expecting. Display it on a Czech PC using Windows-1250
and you will see a DOUBLE ACUTE ACCENT (whatever that is ...).
HTH
Best Regards
Stefan