Subject Re: [firebird-support] Re: Using unicode versus WIN1252 (Firebird
Author Dmitry Yemanov
Milan Babuskov wrote:
>
>> German characters mostly have ASCII < 7F. With UTF8 these characters
>> have the same storage size as in WIN1252, right?
>
> Only when represented in UTF8 form. However, Firebird internally
> represents them as 4-bytes-per-character internally.

It depends on what you mean by "internally". All UTF8 strings are always
stored in the UTF8 encoding, but the appropriate in-memory buffers have
(4 * num_chars) bytes allocated for the string. The trailing zero bytes
unused after the UTF8 encoding are wasted for nothing in memory and are
RLE-compressed while being written to disk.

So I don't see why an ASCII-chars-only UTF8 string would occupy much
more space than the one in the ASCII charset, unless your CHAR/VARCHAR
columns are quite long and the actually stored text is much shorter (and
therefore RLE adds a noticeable overhead).


Dmitry