Subject | Re: [firebird-support] Bug with character sets |
---|---|
Author | Kjell Rilbe |
Post date | 2009-05-20T11:00:49Z |
Dmitry Yemanov wrote:
in char(N).
//with default charset utf8
create table X (A char(2), B char(2))
insert into X (A, B) values ('A ', 'B ')
select A || B from X
would produce this string:
'A<7_spaces>B<7_spaces>'
This is clearly not the case. It produces the correct result:
'A<singlespace>B<singlespace>'
However, that would then be presented to the application, by fbclient, as:
'A<singlespace>B<13spaces>'
Now, my application wants the correct result with two single spaces. How
does it get that today? It has to query the system table to find the
right factor 4 for utf8 and divide the buffer size 16 by 4 and get the
codepoint length 4, to be able to parse/trim the buffer correctly.
If the codepoint length 4 were passed in the SQLDA struct, it would at
least be easier, because it would not have to do the extra round-trip to
query the system table. It would be much more direct.
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64
> Kjell Rilbe wrote:Nowhere in this thread have I used N to refer to anything except the N
> >
> > The server *knows* N.
>
> If your N means "bytes actually used", then you're wrong.
in char(N).
> CHARs are notIf that were true, then this:
> padded when transmitted. They're stored padded and the padding character
> is a proper part of the string (accordingly to the SQL standard, BTW).
> The engine doesn't care how many non-spaces are there, so it cannot help
> you with knowing N.
//with default charset utf8
create table X (A char(2), B char(2))
insert into X (A, B) values ('A ', 'B ')
select A || B from X
would produce this string:
'A<7_spaces>B<7_spaces>'
This is clearly not the case. It produces the correct result:
'A<singlespace>B<singlespace>'
However, that would then be presented to the application, by fbclient, as:
'A<singlespace>B<13spaces>'
Now, my application wants the correct result with two single spaces. How
does it get that today? It has to query the system table to find the
right factor 4 for utf8 and divide the buffer size 16 by 4 and get the
codepoint length 4, to be able to parse/trim the buffer correctly.
If the codepoint length 4 were passed in the SQLDA struct, it would at
least be easier, because it would not have to do the extra round-trip to
query the system table. It would be much more direct.
>The same goes for "characters actually used", as theKjell
> only fact it knows is that CHAR_LENGTH(CHAR(X)) == X, again regardless
> of how many non-spaces are there.
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64