Subject | Re: [firebird-support] Bug with character sets |
---|---|
Author | Kjell Rilbe |
Post date | 2009-05-20T11:22:59Z |
Mark Rotteveel wrote:
Buffer length of 4 bytes, UTF-8 (max 4 bytes per character): will always
contain exaclt one significant chacater and 0-3 spaces because FB
allocates a buffer that will have room for eactly 4N bytes for an UTF-8
column. So, you can reverse that calculation.
You assumed that the allocated buffer is exaclt large enough to fit the
actual data, while it is really sized to be able to hold the largest
possible data for it's charset and declared charlength.
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64
> > But the (X)SQLDA does not contain N. It does however contain charset id,Corrected example:
> > which can be used to obtain "max bytes per char", which can be used in
> > combination with the buffer size specified in SQLDA to determine N.
>
> That is not correct. If I use UTF-8, one character can be 1 to 4 bytes.
> Using the buffersize and the 'max bytes per char' you will NOT be able
> to compute the characterlength.
>
> Example:
> buffer length of 4 bytes, UTF-8 (max 4 bytes per character): is this 1,
> 2, 3 or 4 characters? You simply don't know until you decode.
Buffer length of 4 bytes, UTF-8 (max 4 bytes per character): will always
contain exaclt one significant chacater and 0-3 spaces because FB
allocates a buffer that will have room for eactly 4N bytes for an UTF-8
column. So, you can reverse that calculation.
You assumed that the allocated buffer is exaclt large enough to fit the
actual data, while it is really sized to be able to hold the largest
possible data for it's charset and declared charlength.
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64