firebird-support - Re: [firebird-support] UTF8 in firebird ?

Subject	Re: [firebird-support] UTF8 in firebird ?
Author	Vander Clock Stephane
Post date	2012-01-06T22:52:30Z

> > no, you can store in iso-8859-1 ALL the UTF8 char :)
> > this is the purpose of utf8, to stay compatible with all the previous
> > system.
>
> No it isn't possible. You could attempt to store unicode codepoints in
> ISO-8859-1 by inventing your own encoding,
>

not inventing my own encoding ! simply store in iso8859_1 the code point
(1 UTF8 code point = 1 bytes)

> but you cannot store UTF-8
> encoded characters in ISO-8859-1 because the multi-byte encodings do not
> fit in a single byte ISO-8859-1. If you would take multiple ISO-8859-1
> characters to store the encoding, you cannot do that because some
> bytes (7F
> - 9F) are not allowed in ISO-8859-1 (they are used in Windows-1252
> which is
> based on ISO-8859-1, but also uses 7F-9F).
>

where you see that some bytes are forbidden in ISO8859_1 ? firebird never
complain about it !

http://www.arkadia.com/rus/ look like to work :) it's based on an ISO8859_1
database where UTF8 are stored !

>
> > UTF8 use only ascii > 127 to encode special char. but as i know
> > you i m sure you already know it before ... i just speak here about
> > storage, not decoding ....
>
> If you talk about storage of UTF8 without using actual UTF8, you need to
> use CHARACTER SET OCTETS.
>

yes but i rather prefere the collation of the iso8859_1 as we mostly
target latin language. of course this break when UTF8 code point are founded

>
> > take this exemple: in html all special char are handle like &ecute; <
> > etc... did that
> > mean that i will need to x 5 the size of my varchar field that i use to
> > store
> > html encoded text ?? of course not except if i store cyrrilic or
> > chinesse char ...
>
> That is not comparable at all as they are escape sequences not character
> encodings (and if you use UTF8 as your page encoding for HTML, you don't
> need to use most escape sequences).
>

not comparable at all ? sorry it's fully comparable ! when they invent
the UTF8 they just think about a mechanism to encode char (like
html do). so they say all char < ascii 127 will stay like it is and all char

> acsii 127 will be encoded in 2 or more bytes. something like html do

for exemple the char ">" is encoded in 4 bytes ">"

>
> Spanish and French work just fine with ISO-8859-1, if you also need
> Polish, then yes you will definitely need UTF8.
>

and have a 2x more bigger databases, 2x more slower systeme
because of this :(

[Non-text portions of this message have been removed]