| Subject | Need a Clue about Cyrillic | 
|---|---|
| Author | Jim Starkey | 
| Post date | 2004-09-09T22:41:22Z | 
I'm trying to get Vulcan past the Jaybird Junit test suite.  I'm hanging 
up in the test FBEncodings doing an upcase to Cyrillic. The test stores
a string consisting of the bytes from 0xE0 to 0xEF in a Cyrillic field
then fetches them upcased. According to the test source
(TestFBEncodings.java), the right answer is a string consisting of the
bytes 0xC0 to 0xCF. Vulcan is returning the original string unchanged.
From the other end, the upcase operator does a lookup on the character
set id 50, defined in intlnames.h as
CHARSET("CYRL", CS_CYRL, 0, 1, 256, CS_cyrl, CYRL_c0_init)
COLLATION("DB_RUS", CC_RUSSIA, CS_CYRL, 1, CYRL_c1_init)
COLLATION("PDOX_CYRL", CC_RUSSIA, CS_CYRL, 2, CYRL_c2_init)
END_CHARSET
CYRL_c0_init, the character set initialization function, is defined in
lc_ascii.cpp as:
TEXTTYPE_ENTRY(CYRL_c0_init)
{
static const ASCII POSIX[] = "C.CYRL";
FAMILY_ASCII(parm1, CYRL_c0_init, CS_CYRL, CC_C);
TEXTTYPE_RETURN;
}
The cogent part of FAMILY_ASCII (a very large, very ugly macro) defining
the string update function is:
cache->texttype_fn_str_to_upper = (FPTR_short)
famasc_str_to_upper; \
And, finally, the key line of famasc_str_to_upper, also in lc_ascii.cpp, is:
*pOutStr++ = ASCII7_UPPER(*pStr);
where ASCII7_UPPER is (hold your breath):
#define ASCII7_UPPER(ch) \
((((UCHAR) (ch) >= (UCHAR) ASCII_LOWER_A) && ((UCHAR) (ch) <=
(UCHAR) ASCII_LOWER_Z)) \
? (UCHAR) ((ch)-ASCII_LOWER_A+ASCII_UPPER_A) \
: (UCHAR) (ch))
Now, counting on my fingers, 0xE0 is not between 'a' and 'z', suggesting
strongly that it will be untouched by the upcase operation, suggesting,
in turn, that the JUnit test is wrong. But Firebird 1.5 seems to pass it.
So I'm stumped. Is:
1. The test wrong, i.e. upcase of 0xE0 should in fact be 0xE0 and not
0xC0?
2. The internationalization module coded wrong?
3. The internationalization module built wrong?
4. My understanding of how this whole corner of the world all screwed up?
5. All or some of the above
6. None of the above.
I need a clue. The first person to successfully throw me a line will
have his or her name enshrined in the Vulcan international module for
all of posterity.
Help!
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
[Non-text portions of this message have been removed]
            up in the test FBEncodings doing an upcase to Cyrillic. The test stores
a string consisting of the bytes from 0xE0 to 0xEF in a Cyrillic field
then fetches them upcased. According to the test source
(TestFBEncodings.java), the right answer is a string consisting of the
bytes 0xC0 to 0xCF. Vulcan is returning the original string unchanged.
From the other end, the upcase operator does a lookup on the character
set id 50, defined in intlnames.h as
CHARSET("CYRL", CS_CYRL, 0, 1, 256, CS_cyrl, CYRL_c0_init)
COLLATION("DB_RUS", CC_RUSSIA, CS_CYRL, 1, CYRL_c1_init)
COLLATION("PDOX_CYRL", CC_RUSSIA, CS_CYRL, 2, CYRL_c2_init)
END_CHARSET
CYRL_c0_init, the character set initialization function, is defined in
lc_ascii.cpp as:
TEXTTYPE_ENTRY(CYRL_c0_init)
{
static const ASCII POSIX[] = "C.CYRL";
FAMILY_ASCII(parm1, CYRL_c0_init, CS_CYRL, CC_C);
TEXTTYPE_RETURN;
}
The cogent part of FAMILY_ASCII (a very large, very ugly macro) defining
the string update function is:
cache->texttype_fn_str_to_upper = (FPTR_short)
famasc_str_to_upper; \
And, finally, the key line of famasc_str_to_upper, also in lc_ascii.cpp, is:
*pOutStr++ = ASCII7_UPPER(*pStr);
where ASCII7_UPPER is (hold your breath):
#define ASCII7_UPPER(ch) \
((((UCHAR) (ch) >= (UCHAR) ASCII_LOWER_A) && ((UCHAR) (ch) <=
(UCHAR) ASCII_LOWER_Z)) \
? (UCHAR) ((ch)-ASCII_LOWER_A+ASCII_UPPER_A) \
: (UCHAR) (ch))
Now, counting on my fingers, 0xE0 is not between 'a' and 'z', suggesting
strongly that it will be untouched by the upcase operation, suggesting,
in turn, that the JUnit test is wrong. But Firebird 1.5 seems to pass it.
So I'm stumped. Is:
1. The test wrong, i.e. upcase of 0xE0 should in fact be 0xE0 and not
0xC0?
2. The internationalization module coded wrong?
3. The internationalization module built wrong?
4. My understanding of how this whole corner of the world all screwed up?
5. All or some of the above
6. None of the above.
I need a clue. The first person to successfully throw me a line will
have his or her name enshrined in the Vulcan international module for
all of posterity.
Help!
--
Jim Starkey
Netfrastructure, Inc.
978 526-1376
[Non-text portions of this message have been removed]