Subject Re: [Firebird-Architect] Re: The Wolf on Firebird 3
Author Olivier Mascia
Le 17-nov.-05 à 13:23, Jim Starkey a écrit :

> So I'm going to ask the question again: What benefit, if any, does 16
> bit Unicode have over UTF-8 for internal representation?

None. It takes 2 and sometimes 4 bytes per character, when utf-8
takes 1 to 4 bytes (not 6 as most people wrongly believe) and utf-32
always takes 4.

utf-8 is of course more compact in memory than utf-32.
While utf-32 as a native in-memory character size might look strange
at first, it might also be even simpler to handle, though it would
mean paying a fee to mister memory (a one hundred characters string
occupying 400 bytes in memory).

I'm a strong supporter of the idea of using utf-8 internally and at
the storage level.
I'm just thinking out loud about wether utf-32 for in-memory while
utf-8 on storage could be an alternate to consider, despite the
evident increased memory requirements. I'll play around with this
idea with a string class patched to use int32_t as its char type and
will see how it impacts some large project here before making my mind
on this (maybe silly) additional idea. It might turn that the typical
machine code handling 32 bits strings on 32 bits and 64 bits
processor give an advantage balancing the increased memory required.
It might as well turn out a dumb idea. I'd bet 70% chances for a dumb
idea until further study.

--
Olivier