Subject | Re: [firebird-support] Re: Problem with COLLATE EN_UK |
---|---|
Author | Ann W. Harrison |
Post date | 2005-10-13T21:08:52Z |
artlooksoftware wrote:
- of the meaning of different collations. I haven't yet found the
mother load, but here's something:
LC_COLLATE
# The following is the Posix Locale definition of LC_COLLATE for UK English.
# The ordering algorithm defined here is:
# . All characters not specifically defined in this collating
# sequence are ordered first, according to their coded
# character set values.
# . The character <NS>, the 'no-break space', has the same
# collation characteristics as <SP>.
# . The <SP> character has no collation weight, ie 'asmith'
# collates to the same value as 'a smith', but the number of
# spaces are relevant, so 'a smith' collates before 'a smith'.
# . Lower-case alphabetics have the same primary collate-weight
# as their upper-case equivalents; upper-case alphabetics have
# higher secondary weights, so 'A smith' collates before
# 'a smith'.
# . No special ordering is imposed for accented characters. This
# is the UK English locale.
The international definition for US collation does consider spaces and
punctuation to have "collation weight". In other words, it's arbitrary
bureaucracy at work.
Another issue is that the correct name for EN_UK is EN_GB, someone at
ISO having decided that the correct two-letter code for the United
Kingdom is GB.
http://www.brookstonesystems.com
Regards,
Ann
>Somewhere there's a specification - probably somewhere on the Posix site
> 1. Why does EN_US sort quite happily on commas and spaces while EN_UK
> does not?
- of the meaning of different collations. I haven't yet found the
mother load, but here's something:
LC_COLLATE
# The following is the Posix Locale definition of LC_COLLATE for UK English.
# The ordering algorithm defined here is:
# . All characters not specifically defined in this collating
# sequence are ordered first, according to their coded
# character set values.
# . The character <NS>, the 'no-break space', has the same
# collation characteristics as <SP>.
# . The <SP> character has no collation weight, ie 'asmith'
# collates to the same value as 'a smith', but the number of
# spaces are relevant, so 'a smith' collates before 'a smith'.
# . Lower-case alphabetics have the same primary collate-weight
# as their upper-case equivalents; upper-case alphabetics have
# higher secondary weights, so 'A smith' collates before
# 'a smith'.
# . No special ordering is imposed for accented characters. This
# is the UK English locale.
The international definition for US collation does consider spaces and
punctuation to have "collation weight". In other words, it's arbitrary
bureaucracy at work.
Another issue is that the correct name for EN_UK is EN_GB, someone at
ISO having decided that the correct two-letter code for the United
Kingdom is GB.
> 2. Can you point me to anywhere which might help with creating my ownThat's going to be hard - possible, but hard. You can start at
> collation order? I program in Delphi not C however.
http://www.brookstonesystems.com
Regards,
Ann