Subject Re: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams
Author Jim Starkey
Svend Meyland Nicolaisen wrote:

>I have lately wondered why UTF-8 generally seem to be preferred over UTF-16?
>I can understand the use of UTF-8 in applications that need to maintain
>backward compatibility with the US-ASCII character set and/or mainly uses
>characters from the US-ASCII character set.
>Also if you need to "allocate" space for an X character wide text field in a
>database like FireBird, I would think that you need to allocate space for
>the worst case scenario which is 4 times X for both UTF-8 and UTF-16. So the
>potential compressions of UTF-8 dosn't help much here.
>
>
>
Do you have any statistical data to show that UTF-16 consumes few bytes
that UTF-8?

Part of the motivation toward developing the data stream encoding is to
move away from pre-allocation and other assumption concerning physical
length altogether.