Subject | Re: [firebird-support] Re: Using unicode versus WIN1252 (Firebird |
---|---|
Author | Dmitry Yemanov |
Post date | 2009-01-11T09:41:28Z |
Milan Babuskov wrote:
stored in the UTF8 encoding, but the appropriate in-memory buffers have
(4 * num_chars) bytes allocated for the string. The trailing zero bytes
unused after the UTF8 encoding are wasted for nothing in memory and are
RLE-compressed while being written to disk.
So I don't see why an ASCII-chars-only UTF8 string would occupy much
more space than the one in the ASCII charset, unless your CHAR/VARCHAR
columns are quite long and the actually stored text is much shorter (and
therefore RLE adds a noticeable overhead).
Dmitry
>It depends on what you mean by "internally". All UTF8 strings are always
>> German characters mostly have ASCII < 7F. With UTF8 these characters
>> have the same storage size as in WIN1252, right?
>
> Only when represented in UTF8 form. However, Firebird internally
> represents them as 4-bytes-per-character internally.
stored in the UTF8 encoding, but the appropriate in-memory buffers have
(4 * num_chars) bytes allocated for the string. The trailing zero bytes
unused after the UTF8 encoding are wasted for nothing in memory and are
RLE-compressed while being written to disk.
So I don't see why an ASCII-chars-only UTF8 string would occupy much
more space than the one in the ASCII charset, unless your CHAR/VARCHAR
columns are quite long and the actually stored text is much shorter (and
therefore RLE adds a noticeable overhead).
Dmitry