firebird-support - Re: [firebird-support] Bug with character sets

Subject	Re: [firebird-support] Bug with character sets
Author	Kjell Rilbe
Post date	2009-05-20T08:08:16Z

Dimitry Sibiryakov wrote:

>>>This last step would probably require a lot of "knowledge" in fbclient
>>>of all possible character encodings, which I expect would bloat that
>>>dll
>>>a bit more than anyone would like. Or would it?
>>
>>Or just have a character length field in the data you receive and have
>>the server calculate this using the character set information it must
>>already have so on the client side I just get data I can use without
>>jumping through hoops!
>
> But you won't get the data anyway. I don't think that replacing some
> space characters with zero characters in data buffer will somehow help you.
> I can't understand why you are so determined to use CHAR when you
> obviously need VARCHAR. They are different datatypes with different
> purposes. You must not bend one to serve for purposes of other.

But what makes you say that all these discussions are based on needing
varchar while trying to use char?

The problem is that when you do use char and want char, you expect to
get N characters for a char(N) column. But instead you get anything from
N to 4N characters (for utf8), and the struct describing the data does
not tell you the value of N.

You can, determine N by combining info from the struct with info from
the system table that descibres charsets, but that's a rather indirect
way of solving the problem.

The more direct and intutive way would be that the returned struct
contains N.

As far as I can see, it could be passed in XSQLVAR.sqlscale without
breaking legacy code.

This would not be sufficient for fbclient to trim the buffer since it
doesn't know how to parse the various encodings (charsets).

So perhaps it would be better to pass the actually used buffer size in
bytes in sqlscale. That would enable fbclient to pass back a buffer to
the application that's the right size in codepoints, which is what i
believe is most intuitive.

This would include trailing spaces up to N codepoints, as opposed to the
current situation where it may be padded with spaces up anything from N
to 4N codepoints (for utf8), and you don't really know how many of them
are significant.

I'd say the current situation is the one that would make everyone use
varchar while actually wanting char, instead of the opposite like you wrote.

Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64