Subject Re: [firebird-support] Bug with character sets
Author Ann W. Harrison
Brad Pepers wrote:

>> A VARCHAR is passed as two bytes of character length, plus
>> that many characters. So the actual stored character length
>> is passed for a VARCHAR. The buffer is fixed size and
>> probably padded with spaces.
>
> So if I select a column using UTF8 for the character set and it has
> two characters each of which takes up 2 bytes, I'll be told it is two
> characters long and not 4 bytes?
>
> If this works, why couldn't CHAR work the same way?

Because the length is part of the VARCHAR representation - the leading
two bytes are the length in characters. There is no such part to the
CHAR data type.


> If selecting a
> CHAR(2) column that uses UTF8 and it has a single byte character and a
> double byte character, return back that it is two characters long.

"Return back" sounds simple, but there is no place in the call or
the structures passed to return that information. If you want the
length in characters, ask for the result as a VARCHAR so there is
a place to put the information you want.

>
>>> Why is the CHAR buffer padded, but it seems VARCHAR isn't?
>> Both are padded. VARCHAR has the character length. CHAR
>> does not. Perhaps the easiest solution is to cast all strings
>> to VARCHAR if you're using a multi-byte (and especially a
>> variable length multi-byte) character set.
>
> That was one option I was looking at to make this work. It is a shame
> though to have to work around it like this.

Think a minute people! You shouldn't be using CHAR for variable
length data. If your data is fixed length, use CHAR - and know
what length you expect. If your data is variable, use VARCHAR.
That's what the VAR stands for.

Cheers,

Ann