Subject | Re: [firebird-support] Firebird client connection charset, server charsets and encoding |
---|---|
Author | Milan Babuskov |
Post date | 2008-11-25T15:04:47Z |
Martijn Tonies wrote:
Not a white paper, but maybe this can help.
You have multiple character sets involved:
1. charset of the operating system desktop environment
2. connection charset
3. character set of a database table column
There is also a database default charset which is only used as default
when creating new columns and is completely irrelevant here.
When you connect it Firebird assumes that all data you send and receive
is encoded with (2), so if you system charset(1) is different, you need
to transliterate back and forth in your application code (or database
access library).
Once data gets to Firebird, it transliterates between (2) and (3), and
also back when you read the data.
Now, if you use NONE, then it means there is no transliteration. If both
your columns(3) and database connection charset(2) are NONE, there would
be no transliteration, and you can read and write everything using (1).
We call this 'garbage in - garbage out' :) There are two problems
with this:
1. Different client applications might use a different operating systems
Consider english, russian and chinese version of Windows XP working on
the same database, or even add a few Linux clients that default to UTF8.
You'll get a mess
2. Sorting and various character-based functions like CHAR_LENGTH don't
not work unless you use ASCII all the time.
HTH
--
Milan Babuskov
http://www.flamerobin.org
http://www.guacosoft.com
> Does anyone know a white-paper or decent explanation ofHi Martijn,
> how the client side connection character set, server side char set
> work together?
>
> Is there any encoding done by Firebird?
Not a white paper, but maybe this can help.
You have multiple character sets involved:
1. charset of the operating system desktop environment
2. connection charset
3. character set of a database table column
There is also a database default charset which is only used as default
when creating new columns and is completely irrelevant here.
When you connect it Firebird assumes that all data you send and receive
is encoded with (2), so if you system charset(1) is different, you need
to transliterate back and forth in your application code (or database
access library).
Once data gets to Firebird, it transliterates between (2) and (3), and
also back when you read the data.
Now, if you use NONE, then it means there is no transliteration. If both
your columns(3) and database connection charset(2) are NONE, there would
be no transliteration, and you can read and write everything using (1).
We call this 'garbage in - garbage out' :) There are two problems
with this:
1. Different client applications might use a different operating systems
Consider english, russian and chinese version of Windows XP working on
the same database, or even add a few Linux clients that default to UTF8.
You'll get a mess
2. Sorting and various character-based functions like CHAR_LENGTH don't
not work unless you use ASCII all the time.
HTH
--
Milan Babuskov
http://www.flamerobin.org
http://www.guacosoft.com