firebird-support - Re: [firebird-support] Character set questions

Subject	Re: [firebird-support] Character set questions
Author	Stefan Heymann
Post date	2009-11-16T18:39:14Z

Alan,

> A database created in character set NONE:
> some columns are defined as something other than NONE e.g. many system table
> columns
> Application connects with NONE
> Now there is a subtle move to unicode afoot.

> What happens if I set the default application connection to UTF8? (i.e. what
> transformations/transliterations will be forced into the system?)

Firebird will transliterate between all kinds of encodings using this
scheme:

Character Set A -> Unicode -> Character Set B

So when your table column is ISO8859_1 and your connection character
set is UTF8, it will do

ISO8859_1 -> Unicode -> UTF-8

When A is NONE, it is difficult, if not impossible to do the
transition to Unicode. NONE is an "array of byte" and there is no
assumption about anything beyond US-ASCII. I don't know what Firebird
will do with such characters, you'll have to try and find out.

> What happens if after some time for whatever reason I change to ASCII
> connection? or vice versa?
> (what will happen to the contents of field values?)

ASCII is not a lot better than NONE. Because ASCII is also restricted
to US-ASCII (characters $00..$7F or 0..127). So all characters beyond
127 are illegal.

> Is it OK to go from ASCII to UTF8 but likely issues of you go the
> other way?

I think it's better to go "up". With UTF8 you have the complete
Unicode range, so when your code is able to handle that, there is no
need to every change that to something else, no matter what language
you have to process.

> If all this is done in the context of an application written for and
> used under Australian (US English) with whatever code page WindowsXP
> runs with, will this cause any other issues? or defray possible
> issues?

English is handled with codepage Windows-1252, which also handles
Western European languages like French, Spanish, Italian, German etc.
So when you know that your application will be limited to these
languages, you can use WIN1252 as the character set for the columns
and the client connection character set.

> Some users like to enter Alt key combinations to achieve 1/2 (half)
> characters. Will those characters be transformed if they were done
> during a NONE connection but reviewed under an ASII or UTF8
> connection?

NONE will store the single byte value that the "half" character had
when it was entered. ASCII does not have this character at all.

The Unicode code point is
00BD VULGAR FRACTION ONE HALF
The value $BD is also the same for ISO-8859-1 (Latin-1) and
Windows-1252.

Other character sets like ISO-8859-2 (Latin-2) and
Windows-1250 for Eastern European languages (like Czech and Polish)
don't have this character.

> At the moment I see the correct 1/2 or 1/4 character when seen under ASCII
> connections where they have been created under NONE connections.. is this an
> accident or is this the way it should be?

This is an accident. The NONE character set stored it as $BD, the
ASCII connection obviously doesn't care about characters > 127 it
doesn't know and passes them on as it gets them. When displayed on an
English language system using Windows-1252, it will render as the 1/2
sign you are expecting. Display it on a Czech PC using Windows-1250
and you will see a DOUBLE ACUTE ACCENT (whatever that is ...).

HTH

Best Regards

Stefan