Subject | Re: [firebird-support] Re: Using unicode versus WIN1252 (Firebird 2) |
---|---|
Author | Milan Babuskov |
Post date | 2009-01-10T16:47:41Z |
Douglas Tosi wrote:
cover the same set of characters. So, what's the point then?
---------------
Unicode is a set of characters, which can be represented in various
binary forms.
- One of those is UTF8. Firebird does give you UTF8 when you
send/receive data, but internally it stores it as 4 bytes per character.
- Another one is UNICODE_FSS for which Firebird uses 3 bytes per character.
So, UTF8 and UNICODE_FSS are just binary encodings of Unicode character set.
Since many programs and operating systems use UTF8 by default it is a
nice feature to have UTF8 connection character set, so you don't have to
translate in your code.
OTOH, when you would have UTF8 connection charset and use UNICODE_FSS
for storage Firebird would have to transliterate back and forth each
time you read/write to database. It's more efficient to have UTF8 as
storage as well.
Now, why does Firebird use 4 bytes per character instead of writing UTF8
bytes directly to the disk? My guess is that indexes would work much
slower in that case since the engine would have to understand data
semantics (how many bytes is each character) instead of just comparing
bytes which is really fast. Obviously, disk space was sacrificed for
performance.
--
Milan Babuskov
http://www.flamerobin.org
http://www.guacosoft.com
> Not the same thing.It is.
> If Firebird uses the same space to store each of them, I don't see theFirebird uses the same space to store ISO-8859-1 and WIN1252 and they
> point in having both.
cover the same set of characters. So, what's the point then?
---------------
Unicode is a set of characters, which can be represented in various
binary forms.
- One of those is UTF8. Firebird does give you UTF8 when you
send/receive data, but internally it stores it as 4 bytes per character.
- Another one is UNICODE_FSS for which Firebird uses 3 bytes per character.
So, UTF8 and UNICODE_FSS are just binary encodings of Unicode character set.
Since many programs and operating systems use UTF8 by default it is a
nice feature to have UTF8 connection character set, so you don't have to
translate in your code.
OTOH, when you would have UTF8 connection charset and use UNICODE_FSS
for storage Firebird would have to transliterate back and forth each
time you read/write to database. It's more efficient to have UTF8 as
storage as well.
Now, why does Firebird use 4 bytes per character instead of writing UTF8
bytes directly to the disk? My guess is that indexes would work much
slower in that case since the engine would have to understand data
semantics (how many bytes is each character) instead of just comparing
bytes which is really fast. Obviously, disk space was sacrificed for
performance.
--
Milan Babuskov
http://www.flamerobin.org
http://www.guacosoft.com