Subject | Re: [firebird-support] Bug with character sets |
---|---|
Author | Kjell Rilbe |
Post date | 2009-05-20T10:24:22Z |
Dimitry Sibiryakov wrote:
actually totally deprecated and everyone should use varchar even for
fixed-width data? Not!
(or in combination with) my suggestion of passing N or bytesactuallyused
in XSQLVAR.sqlscale. But I would expect that passing N in
XSQLVAR.sqlscale would break less legacy code.
whitespace).
Let's assume I have a char(2) charset utf8 column. In one record I have
'NÄ' and in another I have 'NN'. When I select from this, I expect two
values of the same length, i.e. 2 codepoints each. But that's not what I
get. I get one record containing 'NÄ<5_spaces>' and another containing
'NN<6_spaces>'.
It's not just that these two values are not 2 characters. They aren't
even the same length! This is *very* *strange* considering I've selected
from a fixed-length column! OK, so if I trim trailing whitespace I get
what I want, but:
Now, add a record with 'N<space>'. When selecting I probably want to
retain that trailing space, otherwise I would have used varchar! So, now
if I try to get the right size by trimming whitespace, I get 'NÄ'
(correct), 'NN' correct, and 'N' WRONG!!!!
From the application it's not always easy to determine N, because the
select may very well contain concatenations etc. So, when faced with
data such as in the example, how is my application supposed to e.g. pad
it to a fixed-width (in codepoints) text file? Or to display it with
proper alignment onscreen with a fixed-width font? Or whatever other use
I might have for fixed-width data? I don't get fixed-width data when I
expect, and even when all values coincidentally happen to have the same
width, it may in fact be a totally *incorrect* fixed width.
The server *knows* N. Why is it so scary to consider the possibility to
pass that info to fbclient, to be forwarded to the application? I don't
expect fbclinet to do anything with it, just pass it along. Then it is
up to the application to be able to parse the buffer and extrac exactly
N codepoints. But it *has* to know N!
Milan's solution works, but I agree with Martijn that it's really weird
that an application should have to query a system-table to do something
that really should be done by the api.
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64
>>But the application's work would be a lot easier if N (orBut we're not discussing varchar here. Or are you saying that char is
>>"bytesactuallyused") were contained in the (X)SQLVAR struct.
>
>
> And good news are: "bytesactuallyused" is contained. For VARCHAR.
actually totally deprecated and everyone should use varchar even for
fixed-width data? Not!
>>Yes, if the struct docs says that the buffer will be padded with spacesGood! Then \0 padding seems like a viable option, as an alternative to
>>up to N codepoints and \0 beyond that. But that would also require that
>>\0 can't occur in the actual data value. I suppose that is and should
>>still be allowed, so i that case the change in padding would not help.
>
> \0 is allowed in cahracter set OCTETS only. All other character sets,
> including NONE, prohibit it.
(or in combination with) my suggestion of passing N or bytesactuallyused
in XSQLVAR.sqlscale. But I would expect that passing N in
XSQLVAR.sqlscale would break less legacy code.
>>But note that noone has asked for that change in padding. What we'veI don't want the actual length of the string (meaning strip trailing
>>been asking for is N or "bytesactuallyused". And that *would* help.
>
>
> And this is exactly the reason for VARCHAR existence - to have actual
> length on the string.
whitespace).
Let's assume I have a char(2) charset utf8 column. In one record I have
'NÄ' and in another I have 'NN'. When I select from this, I expect two
values of the same length, i.e. 2 codepoints each. But that's not what I
get. I get one record containing 'NÄ<5_spaces>' and another containing
'NN<6_spaces>'.
It's not just that these two values are not 2 characters. They aren't
even the same length! This is *very* *strange* considering I've selected
from a fixed-length column! OK, so if I trim trailing whitespace I get
what I want, but:
Now, add a record with 'N<space>'. When selecting I probably want to
retain that trailing space, otherwise I would have used varchar! So, now
if I try to get the right size by trimming whitespace, I get 'NÄ'
(correct), 'NN' correct, and 'N' WRONG!!!!
From the application it's not always easy to determine N, because the
select may very well contain concatenations etc. So, when faced with
data such as in the example, how is my application supposed to e.g. pad
it to a fixed-width (in codepoints) text file? Or to display it with
proper alignment onscreen with a fixed-width font? Or whatever other use
I might have for fixed-width data? I don't get fixed-width data when I
expect, and even when all values coincidentally happen to have the same
width, it may in fact be a totally *incorrect* fixed width.
The server *knows* N. Why is it so scary to consider the possibility to
pass that info to fbclient, to be forwarded to the application? I don't
expect fbclinet to do anything with it, just pass it along. Then it is
up to the application to be able to parse the buffer and extrac exactly
N codepoints. But it *has* to know N!
Milan's solution works, but I agree with Martijn that it's really weird
that an application should have to query a system-table to do something
that really should be done by the api.
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64