Subject | Case and Accent insensitive compares |
---|---|
Author | Stefan Heymann |
Post date | 2016-06-15T15:28:34Z |
I expect that an accent insensitive compare treats accented characters
as the "same" as their un-accented counterparts because the accent
does not change the character itself but things like pronounciation or
stress.
So in Frech, à is similar to a, é is similar to è and you use an
accent insensitive compare to find Gérard even though your search term
says Gerard (without the accent).
However, in the German language, the letters Ö and O are two different
characters with a completely different pronounciation (the same is
true for A/Ä and U/Ü). As they look similar, the sorting is done so
that they stay together, but they can _not_ be treated as accented
versions of each other.
When I use the UNICODE_CI_AI collation to compare them, Firebird
treats them as the same:
select case when 'a' = 'ä' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: not equal' from rdb$database
union all
select case when 'O' = 'Ö' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: not equal' from rdb$database
union all
select case when 'Ä' = 'ä' collate unicode_ci then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'a' = 'à' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'c' = 'ç' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'é' = 'è' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
delivers:
equal expected: not equal
equal expected: not equal
equal expected: equal
equal expected: equal
equal expected: equal
equal expected: equal
Is there something that can be done to improve this?
Regards
Stefan
--
Stefan Heymann, Tübingen, Germany
as the "same" as their un-accented counterparts because the accent
does not change the character itself but things like pronounciation or
stress.
So in Frech, à is similar to a, é is similar to è and you use an
accent insensitive compare to find Gérard even though your search term
says Gerard (without the accent).
However, in the German language, the letters Ö and O are two different
characters with a completely different pronounciation (the same is
true for A/Ä and U/Ü). As they look similar, the sorting is done so
that they stay together, but they can _not_ be treated as accented
versions of each other.
When I use the UNICODE_CI_AI collation to compare them, Firebird
treats them as the same:
select case when 'a' = 'ä' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: not equal' from rdb$database
union all
select case when 'O' = 'Ö' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: not equal' from rdb$database
union all
select case when 'Ä' = 'ä' collate unicode_ci then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'a' = 'à' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'c' = 'ç' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
union all
select case when 'é' = 'è' collate unicode_ci_ai then 'equal' else 'not equal' end || ' expected: equal' from rdb$database
delivers:
equal expected: not equal
equal expected: not equal
equal expected: equal
equal expected: equal
equal expected: equal
equal expected: equal
Is there something that can be done to improve this?
Regards
Stefan
--
Stefan Heymann, Tübingen, Germany