Subject | Re: [firebird-support] UTF8 in firebird ? |
---|---|
Author | Vander Clock Stephane |
Post date | 2012-01-06T22:52:30Z |
> > no, you can store in iso-8859-1 ALL the UTF8 char :)not inventing my own encoding ! simply store in iso8859_1 the code point
> > this is the purpose of utf8, to stay compatible with all the previous
> > system.
>
> No it isn't possible. You could attempt to store unicode codepoints in
> ISO-8859-1 by inventing your own encoding,
>
(1 UTF8 code point = 1 bytes)
> but you cannot store UTF-8where you see that some bytes are forbidden in ISO8859_1 ? firebird never
> encoded characters in ISO-8859-1 because the multi-byte encodings do not
> fit in a single byte ISO-8859-1. If you would take multiple ISO-8859-1
> characters to store the encoding, you cannot do that because some
> bytes (7F
> - 9F) are not allowed in ISO-8859-1 (they are used in Windows-1252
> which is
> based on ISO-8859-1, but also uses 7F-9F).
>
complain about it !
http://www.arkadia.com/rus/ look like to work :) it's based on an ISO8859_1
database where UTF8 are stored !
>yes but i rather prefere the collation of the iso8859_1 as we mostly
> > UTF8 use only ascii > 127 to encode special char. but as i know
> > you i m sure you already know it before ... i just speak here about
> > storage, not decoding ....
>
> If you talk about storage of UTF8 without using actual UTF8, you need to
> use CHARACTER SET OCTETS.
>
target latin language. of course this break when UTF8 code point are founded
>not comparable at all ? sorry it's fully comparable ! when they invent
> > take this exemple: in html all special char are handle like &ecute; <
> > etc... did that
> > mean that i will need to x 5 the size of my varchar field that i use to
> > store
> > html encoded text ?? of course not except if i store cyrrilic or
> > chinesse char ...
>
> That is not comparable at all as they are escape sequences not character
> encodings (and if you use UTF8 as your page encoding for HTML, you don't
> need to use most escape sequences).
>
the UTF8 they just think about a mechanism to encode char (like
html do). so they say all char < ascii 127 will stay like it is and all char
> acsii 127 will be encoded in 2 or more bytes. something like html dofor exemple the char ">" is encoded in 4 bytes ">"
>and have a 2x more bigger databases, 2x more slower systeme
> Spanish and French work just fine with ISO-8859-1, if you also need
> Polish, then yes you will definitely need UTF8.
>
because of this :(
[Non-text portions of this message have been removed]