Subject Re: [firebird-support] CHAR and VARCHAR disk space
Author Ann W. Harrison
Adomas Urbanavicius wrote:
> Hello,
> But according to claimed VARCHAR disk saving space,

Varchar will use more space than char on disk - no one disputes that, or
any one who does should have been corrected. Varchar fields have a two
byte length at the front, followed by the character string. Char fields
have just the character string. Both are blank filled (unless they're
charset octets). Run length compression turns the trailing blanks into
two bytes per 255 characters.

If you store a single character 'A' in a 200 byte character field, what
will be written to disk (probably, it depends on the values in
surrounding fields), is 1 byte of length indicating that the next byte
is data, one byte with the character 'A', a second byte of length
(negative in this case) indicating that the next byte is to be repeated
199 times,
and the character ' ' <- blank. Total, 4 bytes.

If you store a single character 'A' in a varchar field, what will be
written to disk (same caveat) is one byte indicating that the next three
bytes are significant, three bytes containing 0x0, 0x1, and 'A',
followed by a second length byte containing -199 that indicates that the
next byte is to be repeated 199 times, then the trailing byte of blank.
Total 6 bytes. 50% more storage.

> This shows that varchar uses up to 10% more of disk space.

Your test includes fixed page and record overhead. Other than that,
it's the second worst case - worst case is an all blank string following
a string with trailing blanks. In that case, the char field takes no
space at all, while the varchar takes four bytes.

>
> Made selection :
> select field1 from table1
> group by field1
> CHAR DB : ~55sec
> VARCHAR : ~65sec

Lots of other factors can affect performance, without a lot more detail
I'd be hesitant to say which type was faster. One area where varchar
has a performance advantage is remote transmission. There, the trailing
blanks are passed for char and not for varchar.

My advice is to use char for values that are actually fixed length -
postal codes, US social security numbers, etc., and varchar for values
that vary in length. The time you spend trimming off trailing blanks
totally negates any tiny saving in on disk space.

Regards.


Ann