Subject | Re: [firebird-support] Right-padded char fields? |
---|---|
Author | Olivier Mascia |
Post date | 2008-09-01T13:58:12Z |
Le 01-sept.-08 à 14:08, Kjell Rilbe a écrit :
size from the physical size by divided by 4, assuming it knows at the
point in the code that the column we deal with is UTF8.
Not something clever.
I have not checked the bits and bytes of the protocol lately (I must
confess), when the engine sends back a UTF8 CHAR(2) back to the
client, the space allocated in the XSLQVAR will be 8 bytes and the
sqllen field will return 8 to reflect that. But what about the
content of these 8 bytes?
Assume the content is the letters "AB" (so no multi-bytes codepoints
involved). What will actually be delivered to the client in the 8
bytes buffer?
1) 'ABss0000' (s being a space, 0 being a binary 0)
Or
2) 'ABssssss'
Or
3) 'AB000000'
What format is guaranteed to arrive at the client side as a result of
a fetch through the API?
The 1) Would be the preferred one, where the engine itself (or its
interfaces at least) do *properly* right-pad with blanks, that is up
to the number of code-points declared (2 in this case) and then right-
pad the remaining with zeroes.
Then interface layers can do the right thing, easily.
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com
> I assume the clients can strip the superfluous blanks (codepointsAn intermediate driver like IBPP could indeed deduce the codepoint
> beyond
> the specified char(n) length n), but it seems like an odd thing to
> require from clients. If a column is supposed to contain n codepoints
> you shouldn't get up to n*4 codepoints returned from the API. It's
> counter intuitive.
>
> The only reason I can see to do it like that is to maintain backward
> compatibility in the API, and that is the problem I humbly suggested a
> solution to.
size from the physical size by divided by 4, assuming it knows at the
point in the code that the column we deal with is UTF8.
Not something clever.
I have not checked the bits and bytes of the protocol lately (I must
confess), when the engine sends back a UTF8 CHAR(2) back to the
client, the space allocated in the XSLQVAR will be 8 bytes and the
sqllen field will return 8 to reflect that. But what about the
content of these 8 bytes?
Assume the content is the letters "AB" (so no multi-bytes codepoints
involved). What will actually be delivered to the client in the 8
bytes buffer?
1) 'ABss0000' (s being a space, 0 being a binary 0)
Or
2) 'ABssssss'
Or
3) 'AB000000'
What format is guaranteed to arrive at the client side as a result of
a fetch through the API?
The 1) Would be the preferred one, where the engine itself (or its
interfaces at least) do *properly* right-pad with blanks, that is up
to the number of code-points declared (2 in this case) and then right-
pad the remaining with zeroes.
Then interface layers can do the right thing, easily.
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com