Subject | Firebird Bug Tracker #916919 |
---|---|
Author | Peter Jacobi |
Post date | 2004-03-16T10:35:07Z |
Regarding the newly filed charset/collation
defect:
https://sourceforge.net/tracker/?func=detail&atid=109028&aid=916919&group_id=9028
I'll try to move the discussion to either firebird-architect or
firebird-support, as
discussing through the bug tracker is ineffective and spams the developer
list.
Scroll down to the bottom, for the newest response.
For context:
Summary: (?)
single-byte charset varchar's in index become multi-byt
create domain name as
varchar(40) character set win1251 collate pxw_cyrl;
create domain code as
varchar(20) character set win1251 collate pxw_cyrl;
create table table1 (
t1_author name,
t1_title name,
t1_code code
);
alter table table1
add constraint t1_uk_author_title_code
unique (author, title, code);
/* key size ~ 40 + 40 + 20 byte = 100, because win1251
charset is single-byte (in table RDB$CHARACTER_SETS
field RDB$BYTES_PER_CHARACTER is 1 for this charset);
according this
material:
http://ibphoenix.com/main.nfs?a=ibphoenix&l=;IBPHOENIX.KNOWLEDGEBASE;ID='116',
, key size must be < 256 bytes, and, in this case,
~ 100 << 256 */
Then getting error:
"This operation is not defined for system tables.
unsuccessful metadata update.
key size too big for index t1_uk_author_title_code."
--------------------------------------------------------------------------------
Date: 2004-03-16 09:34
Sender: peter_jacobi
Logged In: YES
user_id=845149
The behaviour is as designed (but the design can be
questioned). The current implementation of collations needs
sortkeys about three times as large as the original string.
For performance reasons all index comparisons are straight
binary, and so columns with collation must be translated into
something whose binary comparison gives the desired result.
The actual implementation of this string=>sortkey translation
has room for improvement.
--------------------------------------------------------------------------------
Date: 2004-03-16 11:06
Sender: boreas
Logged In: YES
user_id=998693
Such behaviour appears only with collation order PXW_CYRL!
For default collation order (collation order WIN1251 for
charset WIN1251, I mean) I have successfully created a key
with two 100-chars and one 20-chars fields (more then 220
bytes and approximately corresponds maximum 256 bytes)...
Any ideas? To use WIN1251 collation order?..
--------------------------------------------------------------------------------
The behaviour occurs with every collation that uses multi-level sort order.
If you don't
know what multi-level collations are, it may be the case that you don't need
them.
For the collations named as the underlying charsets (and effective as
default), this
problem doesn't exist, as the sortkey is equal to the string itself. They
implement
a sort order matching the charset's codepoint order. If this OK for you, or
if you just need
*any* sort of index for uniqueness testing, use them by all means!
Regards,
Peter Jacobi
--
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz
defect:
https://sourceforge.net/tracker/?func=detail&atid=109028&aid=916919&group_id=9028
I'll try to move the discussion to either firebird-architect or
firebird-support, as
discussing through the bug tracker is ineffective and spams the developer
list.
Scroll down to the bottom, for the newest response.
For context:
Summary: (?)
single-byte charset varchar's in index become multi-byt
create domain name as
varchar(40) character set win1251 collate pxw_cyrl;
create domain code as
varchar(20) character set win1251 collate pxw_cyrl;
create table table1 (
t1_author name,
t1_title name,
t1_code code
);
alter table table1
add constraint t1_uk_author_title_code
unique (author, title, code);
/* key size ~ 40 + 40 + 20 byte = 100, because win1251
charset is single-byte (in table RDB$CHARACTER_SETS
field RDB$BYTES_PER_CHARACTER is 1 for this charset);
according this
material:
http://ibphoenix.com/main.nfs?a=ibphoenix&l=;IBPHOENIX.KNOWLEDGEBASE;ID='116',
, key size must be < 256 bytes, and, in this case,
~ 100 << 256 */
Then getting error:
"This operation is not defined for system tables.
unsuccessful metadata update.
key size too big for index t1_uk_author_title_code."
--------------------------------------------------------------------------------
Date: 2004-03-16 09:34
Sender: peter_jacobi
Logged In: YES
user_id=845149
The behaviour is as designed (but the design can be
questioned). The current implementation of collations needs
sortkeys about three times as large as the original string.
For performance reasons all index comparisons are straight
binary, and so columns with collation must be translated into
something whose binary comparison gives the desired result.
The actual implementation of this string=>sortkey translation
has room for improvement.
--------------------------------------------------------------------------------
Date: 2004-03-16 11:06
Sender: boreas
Logged In: YES
user_id=998693
Such behaviour appears only with collation order PXW_CYRL!
For default collation order (collation order WIN1251 for
charset WIN1251, I mean) I have successfully created a key
with two 100-chars and one 20-chars fields (more then 220
bytes and approximately corresponds maximum 256 bytes)...
Any ideas? To use WIN1251 collation order?..
--------------------------------------------------------------------------------
The behaviour occurs with every collation that uses multi-level sort order.
If you don't
know what multi-level collations are, it may be the case that you don't need
them.
For the collations named as the underlying charsets (and effective as
default), this
problem doesn't exist, as the sortkey is equal to the string itself. They
implement
a sort order matching the charset's codepoint order. If this OK for you, or
if you just need
*any* sort of index for uniqueness testing, use them by all means!
Regards,
Peter Jacobi
--
+++ NEU bei GMX und erstmalig in Deutschland: TÜV-geprüfter Virenschutz +++
100% Virenerkennung nach Wildlist. Infos: http://www.gmx.net/virenschutz