Subject | Re: [Firebird-Architect] A Fresh Look at Collations |
---|---|
Author | Jim Starkey |
Post date | 2010-06-21T16:18:45Z |
Sergey Mereutsa wrote:
denser and doesn't suffer endian problems of either utf16 or Unicode.
I don't think performance is a significant issue. Sorts and indexes get
resolved to byte streams. Depending on character set, utf8 strings are
longer than most national character sets, but by less than a factor of
two since punctuation, digits, and spaces are all single byte
characters. Finally, the cost of turning utf8 into Unicode is small and
fast:
static inline uint getUnicodeChar(const char*& p)
{
UCHAR c = *p++;
int len = utf8Lengths[c];
uint code = utf8Values[c];
for (; len > 1; --len)
code = (code << 6) | (*p++ & 0x3f);
return code;
}
--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376
[Non-text portions of this message have been removed]
> Hello Jim,It's a multi-national world, hence Unicode. And utf8 because it's
>
> JS> 1. The database engine itself is strictly utf8 only. Character set
>
> I`m not an expert, but from my expirience, working in pure UTF8 is not
> a good idea - it is slow. May be any binary Unicode format is better?
> Or speed is not a goal at all?
>
> P.S. All our texts are in UTF8 (because of 2 languages required by
> default - romanian and russian). Sometimes it is a pain.
>
>
>
denser and doesn't suffer endian problems of either utf16 or Unicode.
I don't think performance is a significant issue. Sorts and indexes get
resolved to byte streams. Depending on character set, utf8 strings are
longer than most national character sets, but by less than a factor of
two since punctuation, digits, and spaces are all single byte
characters. Finally, the cost of turning utf8 into Unicode is small and
fast:
static inline uint getUnicodeChar(const char*& p)
{
UCHAR c = *p++;
int len = utf8Lengths[c];
uint code = utf8Values[c];
for (; len > 1; --len)
code = (code << 6) | (*p++ & 0x3f);
return code;
}
--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376
[Non-text portions of this message have been removed]