Subject Re: [firebird-support] Bug with character sets
Author Brad Pepers
On 20-May-09, at 8:49 AM, Ann W. Harrison wrote:

> Brad Pepers wrote:
>
>>> A VARCHAR is passed as two bytes of character length, plus
>>> that many characters. So the actual stored character length
>>> is passed for a VARCHAR. The buffer is fixed size and
>>> probably padded with spaces.
>>
>> So if I select a column using UTF8 for the character set and it has
>> two characters each of which takes up 2 bytes, I'll be told it is two
>> characters long and not 4 bytes?
>>
>> If this works, why couldn't CHAR work the same way?
>
> Because the length is part of the VARCHAR representation - the leading
> two bytes are the length in characters. There is no such part to the
> CHAR data type.
>
>
>> If selecting a
>> CHAR(2) column that uses UTF8 and it has a single byte character
>> and a
>> double byte character, return back that it is two characters long.
>
> "Return back" sounds simple, but there is no place in the call or
> the structures passed to return that information. If you want the
> length in characters, ask for the result as a VARCHAR so there is
> a place to put the information you want.
>
>>
>>>> Why is the CHAR buffer padded, but it seems VARCHAR isn't?
>>> Both are padded. VARCHAR has the character length. CHAR
>>> does not. Perhaps the easiest solution is to cast all strings
>>> to VARCHAR if you're using a multi-byte (and especially a
>>> variable length multi-byte) character set.
>>
>> That was one option I was looking at to make this work. It is a
>> shame
>> though to have to work around it like this.
>
> Think a minute people! You shouldn't be using CHAR for variable
> length data. If your data is fixed length, use CHAR - and know
> what length you expect. If your data is variable, use VARCHAR.
> That's what the VAR stands for.

In my case I've written a C++ wrapper around the C API to make it
easier to use and work well with Qt (yes I know there is now a
Firebird driver included with Qt but my code pre-dates this!) so I get
a select statement and I execute it and want to handle the results and
build proper Qt strings using UTF8. I don't know the size of the CHAR
columns and expect to be able to get this from the C API. I'm not
using CHAR for variable length data but do need to know the length in
characters of the CHAR columns I'm receiving. Anyone else who is also
writing some generic layer on top of the C API (like FlameRobin or any
other wrapper class for C++/C#/...) will also require this and the
current API is really ugly for it. It is not something I think any of
us are doing wrong or some mistake on our part. It looks like a wart
on the C API that we have to work around by doing something like a
select before hand to get the max bytes for each character set and
then use the character set id we do get from the C API and from that
we can finally get the length of the column in characters.

--
Brad