Subject Re: [firebird-support] UTF8 in firebird ?
Author Vander Clock Stephane
> > no, you can store in iso-8859-1 ALL the UTF8 char :)
> > this is the purpose of utf8, to stay compatible with all the previous
> > system.
> No it isn't possible. You could attempt to store unicode codepoints in
> ISO-8859-1 by inventing your own encoding,
not inventing my own encoding ! simply store in iso8859_1 the code point
(1 UTF8 code point = 1 bytes)

> but you cannot store UTF-8
> encoded characters in ISO-8859-1 because the multi-byte encodings do not
> fit in a single byte ISO-8859-1. If you would take multiple ISO-8859-1
> characters to store the encoding, you cannot do that because some
> bytes (7F
> - 9F) are not allowed in ISO-8859-1 (they are used in Windows-1252
> which is
> based on ISO-8859-1, but also uses 7F-9F).

where you see that some bytes are forbidden in ISO8859_1 ? firebird never
complain about it ! look like to work :) it's based on an ISO8859_1
database where UTF8 are stored !

> > UTF8 use only ascii > 127 to encode special char. but as i know
> > you i m sure you already know it before ... i just speak here about
> > storage, not decoding ....
> If you talk about storage of UTF8 without using actual UTF8, you need to

yes but i rather prefere the collation of the iso8859_1 as we mostly
target latin language. of course this break when UTF8 code point are founded

> > take this exemple: in html all special char are handle like &ecute; <
> > etc... did that
> > mean that i will need to x 5 the size of my varchar field that i use to
> > store
> > html encoded text ?? of course not except if i store cyrrilic or
> > chinesse char ...
> That is not comparable at all as they are escape sequences not character
> encodings (and if you use UTF8 as your page encoding for HTML, you don't
> need to use most escape sequences).

not comparable at all ? sorry it's fully comparable ! when they invent
the UTF8 they just think about a mechanism to encode char (like
html do). so they say all char < ascii 127 will stay like it is and all char
> acsii 127 will be encoded in 2 or more bytes. something like html do
for exemple the char ">" is encoded in 4 bytes ">"

> Spanish and French work just fine with ISO-8859-1, if you also need
> Polish, then yes you will definitely need UTF8.

and have a 2x more bigger databases, 2x more slower systeme
because of this :(

[Non-text portions of this message have been removed]