Subject | Re: [firebird-support] Right-padded char fields? |
---|---|
Author | Olivier Mascia |
Post date | 2008-09-01T18:14:03Z |
Ivan,
from fixed size/padded buffer.
To get to the actual byte count used by a given string, you have to
decode the utf8 string until you reach the count of characters which
you deduced by dividing the buffer size by 4. It requires UTF8 at the
intermediate layers levels, which could have been avoided if the
engine did supply that info. So no big issue, adding UTF8 character
counting to the intermediate layers will solution this. It is just
not as slick as it could have been.
You see, with other character sets, a layer can just take whatever
bytes were received (the buffer) and move them to its caller, wether
in buffers or in dynamic strings. This does not require any knowledge
of the character set storage specifics, it is left to calling
application what it deals with. UTF8 breaks the rules, being a
variable byte length per character, getting the byte length of the
string (in addition to the byte length of the buffer) would have been
a clever plus.
Another tricky way would be to have the buffer zero right-padded if
the stored string uses less bytes than its maximum (even after space
right padding up to the declared count of characters).
VARCHARs should not exhibit the same problem, because I suppose (but
I'll have to recheck this for sure) the real byte length of the string
will be returned (and not the maximum byte length). So whatever the
VARCHAR string received contains, move it to your caller, and you're
done. Again that simple theme breaks with CHARs and just for them (in
the single case of UTF8) comes the need to count the characters to
recompile the exact byte length and truncate where appropriate.
Lets keep it there for now, there is a workaround and maybe one day it
will return the missing information,
Yours,
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com
> After prepare, the xsqlda structure containsIt is sufficient to get the correct *maximum* number of characters
> * datatype (CHAR)
> * character set (UTF8)
> * desired buffer length
>
> IMO it is sufficient to get correct number of characters from fixed
> size/padded buffer.
> (well, yes, I assume the application knows the id used by FB to
> describe UTF8)
> But perhaps I am still missing something ? :)
from fixed size/padded buffer.
To get to the actual byte count used by a given string, you have to
decode the utf8 string until you reach the count of characters which
you deduced by dividing the buffer size by 4. It requires UTF8 at the
intermediate layers levels, which could have been avoided if the
engine did supply that info. So no big issue, adding UTF8 character
counting to the intermediate layers will solution this. It is just
not as slick as it could have been.
You see, with other character sets, a layer can just take whatever
bytes were received (the buffer) and move them to its caller, wether
in buffers or in dynamic strings. This does not require any knowledge
of the character set storage specifics, it is left to calling
application what it deals with. UTF8 breaks the rules, being a
variable byte length per character, getting the byte length of the
string (in addition to the byte length of the buffer) would have been
a clever plus.
Another tricky way would be to have the buffer zero right-padded if
the stored string uses less bytes than its maximum (even after space
right padding up to the declared count of characters).
VARCHARs should not exhibit the same problem, because I suppose (but
I'll have to recheck this for sure) the real byte length of the string
will be returned (and not the maximum byte length). So whatever the
VARCHAR string received contains, move it to your caller, and you're
done. Again that simple theme breaks with CHARs and just for them (in
the single case of UTF8) comes the need to count the characters to
recompile the exact byte length and truncate where appropriate.
Lets keep it there for now, there is a workaround and maybe one day it
will return the missing information,
Yours,
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com