Subject texttype_fn_key_length and MBCS
Author peter_jacobi.rm
Dear List Members,

I'm wondering about one particular detail in FB's
Intl Architecture: specifying string lengths for MBCS
(i.e. mainyl UTF-8 a.k.a. UNICODE_FSS).

As already said, I've (temporarily?) resigned to
understand (let alone modify) the server's internal
and want to concentrate on the loadable fbintl* modules.

But the otherwise rather clear dccumentation in
http://www.brookstonesystems.com/CollateKit.zip
doesn't address this relevant detail.

From Dave's "InterBase Collation Kit" Documentaion
> texttype_fn_key_length
> Calculates key length for a input string length.

From intl/lc_ascii.cpp
<cite>
/*
* key_length (in_len)
*
* For an input string of (in_len) bytes, return the maximum
* key buffer length.
*
* This is used for index buffer allocation within the
* Engine.
*/
</cite>

So if this is called with inLen = 30 for UTF-8 texttype,
should this function return:

a) the maximal key length for an UTF-8 string
of up to 30 bytes

OR

b) the maximal key length for an UTF-8 string
of up to 10 characters

I'm under the impression, the actual use in the server is b),
given that only up to N chars are put into char(N) column,
which is sadly not enforced.

Option b) of course the much more useful scenario, once we start
implementing non-trivial collations for UTF-8, as it allows to
return a smaller key length.

The corollary of this is obviously, whether the inLen
parameter of string_to_key can be interpreted the same way.

Regards,
Peter Jacobi