Subject | Re: [firebird-support] UTF8 in firebird ? |
---|---|
Author | Lester Caine |
Post date | 2012-01-06T12:37:53Z |
Mark Rotteveel wrote:
But my point was more about WORKING with unicode data than the different methods
of transmitting it.
wchar_t ( the 32 bit one ) may be a slower method of working, but it is at least
reliably consistent.
UNICODE_FSS to my mind was a much more logic storage mechanism for unicode data,
being internally 24 bit, and I think that since the data being stored in a
record IS compressed, then as long as the 'uncompress' knows what it is looking
at, then loosing the '10' flag bits is not a problem? Internally simply store
and work UTF32 ... you only need the UTF8 encoding when you send stuff down the
wire or to other applications?
The original post from stéphane was about speeding up 'UTF8', but totally missed
the point about what UTF8 was trying to do, and that is store the full unicode
universe transparently. There is no point 'reinventing the wheel ... if you only
need 256 characters ... don't use unicode! The problem I am increasingly seeing
is international customers supplying their address details in 'strange'
character sets, and often they just get messed up even in the emails, but I am
now fairly reliably managing to handle that data and even create address labels
that are correct ( Annoys the local post mistress since she can't do the
international signed for address :) ). The 'windows' half way house is just a
cluge that does not work ... since I moved to a Linux desktop I've not had ANY
problem with strange characters in emails! And pushing international emails
directly into the database just works ...
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
> The current standard says they will not go further than 0x10FFFF (partlyI prefer http://tools.ietf.org/html/rfc3629 myself ...
> because of UTF-16). I assume with '3 bytes', you mean if you do not use
> UTF8, because U+010000 to U+10FFFF is encoded as 11110www 10zzzzzz 10yyyyyy
> 10xxxxxx for codepoint 000wwwzz zzzzyyyy yyxxxxxx. "UTF-16 limits Unicode
> to 10FFFFhex; therefore UTF-8 is not defined beyond that value, even if it
> could easily be defined to reach 7FFFFFFFhex." (from
> http://en.wikipedia.org/wiki/UTF-8).
But my point was more about WORKING with unicode data than the different methods
of transmitting it.
wchar_t ( the 32 bit one ) may be a slower method of working, but it is at least
reliably consistent.
UNICODE_FSS to my mind was a much more logic storage mechanism for unicode data,
being internally 24 bit, and I think that since the data being stored in a
record IS compressed, then as long as the 'uncompress' knows what it is looking
at, then loosing the '10' flag bits is not a problem? Internally simply store
and work UTF32 ... you only need the UTF8 encoding when you send stuff down the
wire or to other applications?
The original post from stéphane was about speeding up 'UTF8', but totally missed
the point about what UTF8 was trying to do, and that is store the full unicode
universe transparently. There is no point 'reinventing the wheel ... if you only
need 256 characters ... don't use unicode! The problem I am increasingly seeing
is international customers supplying their address details in 'strange'
character sets, and often they just get messed up even in the emails, but I am
now fairly reliably managing to handle that data and even create address labels
that are correct ( Annoys the local post mistress since she can't do the
international signed for address :) ). The 'windows' half way house is just a
cluge that does not work ... since I moved to a Linux desktop I've not had ANY
problem with strange characters in emails! And pushing international emails
directly into the database just works ...
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php