Subject Varchar size overhead
Author Douglas Tosi
Hello!

Today while doing some tests I noticed something odd:
Suppose the following tables with varchar fields of different sizes:
create table test1(t1 varchar(100) character set ascii)
create table test2(t2 varchar(2000) character set ascii)

I filled each table with 1 million records. The source data is the
same for both and each row is filled with 32 characters. Since this is
a varchar field and the source data is the same, I was hoping both
tables would have roughly the same size. But there is a big
difference:

test1 (varchar(100))
Data Pages: 10102
Average Fill: 70%
Average Record Length: 41

test2 (varchar(2000))
Data Pages: 13514
Average Fill: 80%
Average Record Length: 71

This is a 73% increase in average record length. The only difference
is the size of the varchar field.
What causes this overhead?
I would assume the varchar is being allocated to the full size and
then compressed with RLE. The other day I learned UTF8 strings work
that way. Is it the same for any varchar?

Thanks,
--
Douglas Tosi
www.sinatica.com