Subject RE: [firebird-support] Character set questions
Author Alan McDonald
> Alan,
>
> > A database created in character set NONE:
> > some columns are defined as something other than NONE e.g. many
> system table
> > columns
> > Application connects with NONE
> > Now there is a subtle move to unicode afoot.
>
> > What happens if I set the default application connection to UTF8?
> (i.e. what
> > transformations/transliterations will be forced into the system?)
>
> Firebird will transliterate between all kinds of encodings using this
> scheme:
>
> Character Set A -> Unicode -> Character Set B
>
> So when your table column is ISO8859_1 and your connection character
> set is UTF8, it will do
>
> ISO8859_1 -> Unicode -> UTF-8
>
> When A is NONE, it is difficult, if not impossible to do the
> transition to Unicode. NONE is an "array of byte" and there is no
> assumption about anything beyond US-ASCII. I don't know what Firebird
> will do with such characters, you'll have to try and find out.



>
> > What happens if after some time for whatever reason I change to ASCII
> > connection? or vice versa?
> > (what will happen to the contents of field values?)
>
> ASCII is not a lot better than NONE. Because ASCII is also restricted
> to US-ASCII (characters $00..$7F or 0..127). So all characters beyond
> 127 are illegal.
>
> > Is it OK to go from ASCII to UTF8 but likely issues of you go the
> > other way?
>
> I think it's better to go "up". With UTF8 you have the complete
> Unicode range, so when your code is able to handle that, there is no
> need to every change that to something else, no matter what language
> you have to process.
>
> > If all this is done in the context of an application written for and
> > used under Australian (US English) with whatever code page WindowsXP
> > runs with, will this cause any other issues? or defray possible
> > issues?
>
> English is handled with codepage Windows-1252, which also handles
> Western European languages like French, Spanish, Italian, German etc.
> So when you know that your application will be limited to these
> languages, you can use WIN1252 as the character set for the columns
> and the client connection character set.
>
> > Some users like to enter Alt key combinations to achieve 1/2 (half)
> > characters. Will those characters be transformed if they were done
> > during a NONE connection but reviewed under an ASII or UTF8
> > connection?
>
> NONE will store the single byte value that the "half" character had
> when it was entered. ASCII does not have this character at all.
>
> The Unicode code point is
> 00BD VULGAR FRACTION ONE HALF
> The value $BD is also the same for ISO-8859-1 (Latin-1) and
> Windows-1252.
>
> Other character sets like ISO-8859-2 (Latin-2) and
> Windows-1250 for Eastern European languages (like Czech and Polish)
> don't have this character.
>
>
> > At the moment I see the correct 1/2 or 1/4 character when seen under
> ASCII
> > connections where they have been created under NONE connections.. is
> this an
> > accident or is this the way it should be?
>
> This is an accident. The NONE character set stored it as $BD, the
> ASCII connection obviously doesn't care about characters > 127 it
> doesn't know and passes them on as it gets them. When displayed on an
> English language system using Windows-1252, it will render as the 1/2
> sign you are expecting. Display it on a Czech PC using Windows-1250
> and you will see a DOUBLE ACUTE ACCENT (whatever that is ...).
>
> HTH
>
> Best Regards
>
> Stefan

Thanks Stefan. That's pretty comprehensive.
I think I can be content, then, that shifting from NONE based connections to
ASCII connections where I am using ISO-8859-1 (Latin-1) and Windows-1252 is
my best "half-way house" for the moment. Moving the application to Unicode
will be a long term ambition I think.
Correct me if I'm wrong but when this time comes I should be thinking of
re-creating the DB with all columns which are currently NONE changed to
UTF8, then datapumping everything across.?

Alan