Subject | texttype_fn_key_length and MBCS |
---|---|
Author | peter_jacobi.rm |
Post date | 2004-05-21T10:38:27Z |
Dear List Members,
I'm wondering about one particular detail in FB's
Intl Architecture: specifying string lengths for MBCS
(i.e. mainyl UTF-8 a.k.a. UNICODE_FSS).
As already said, I've (temporarily?) resigned to
understand (let alone modify) the server's internal
and want to concentrate on the loadable fbintl* modules.
But the otherwise rather clear dccumentation in
http://www.brookstonesystems.com/CollateKit.zip
doesn't address this relevant detail.
From Dave's "InterBase Collation Kit" Documentaion
<cite>
/*
* key_length (in_len)
*
* For an input string of (in_len) bytes, return the maximum
* key buffer length.
*
* This is used for index buffer allocation within the
* Engine.
*/
</cite>
So if this is called with inLen = 30 for UTF-8 texttype,
should this function return:
a) the maximal key length for an UTF-8 string
of up to 30 bytes
OR
b) the maximal key length for an UTF-8 string
of up to 10 characters
I'm under the impression, the actual use in the server is b),
given that only up to N chars are put into char(N) column,
which is sadly not enforced.
Option b) of course the much more useful scenario, once we start
implementing non-trivial collations for UTF-8, as it allows to
return a smaller key length.
The corollary of this is obviously, whether the inLen
parameter of string_to_key can be interpreted the same way.
Regards,
Peter Jacobi
I'm wondering about one particular detail in FB's
Intl Architecture: specifying string lengths for MBCS
(i.e. mainyl UTF-8 a.k.a. UNICODE_FSS).
As already said, I've (temporarily?) resigned to
understand (let alone modify) the server's internal
and want to concentrate on the loadable fbintl* modules.
But the otherwise rather clear dccumentation in
http://www.brookstonesystems.com/CollateKit.zip
doesn't address this relevant detail.
From Dave's "InterBase Collation Kit" Documentaion
> texttype_fn_key_lengthFrom intl/lc_ascii.cpp
> Calculates key length for a input string length.
<cite>
/*
* key_length (in_len)
*
* For an input string of (in_len) bytes, return the maximum
* key buffer length.
*
* This is used for index buffer allocation within the
* Engine.
*/
</cite>
So if this is called with inLen = 30 for UTF-8 texttype,
should this function return:
a) the maximal key length for an UTF-8 string
of up to 30 bytes
OR
b) the maximal key length for an UTF-8 string
of up to 10 characters
I'm under the impression, the actual use in the server is b),
given that only up to N chars are put into char(N) column,
which is sadly not enforced.
Option b) of course the much more useful scenario, once we start
implementing non-trivial collations for UTF-8, as it allows to
return a smaller key length.
The corollary of this is obviously, whether the inLen
parameter of string_to_key can be interpreted the same way.
Regards,
Peter Jacobi