Subject RE: [firebird-support] duplicates in a table with PK
Author Helen Borrie
At 11:07 PM 1/06/2007, you wrote:
>Hi Helen,
>
> > You already did - gstat -h told you the default character set is
> > NONE. That means data are stored in straight ascii format. The
> > database won't know how to locate strings containing accented
> > characters. Your sample data, at least, doesn't appear to contain
> > accented characters.
>
>Data in the VARCHAR(100) should not contain
>accented characters, but it might contain Ñ. Don't know if that is a problem.

Anticipate problems with charset NONE and accented characters.


> > If you are storing strings with Spanish accented characters, using
> > character set NONE is going to bite you one way or another, since it
> > recognises only the US ASCII set of characters as characters.
> >
> > I suggest you treat today's database rebuild as a practice exercise
> > but rebuild the for-production database with a Spanish-aware
> > character set as the default character set and a suitable collation
> > on indexed fields to allow searching and matching to make
> > sense. Then make sure that your client programs connect with that
> > character set.
>
>Ok. We'll do that. For new databases is it
>enough to select CHARSET ISO8859_1 in
>FlameRobin's "Create new database" window?

Yes.

>Will that set the correct collation as well?

No. You don't set collation at database
level. All characters in a character set have a
numeric code. The default collation sequence for
all character sets is by code, which isn't very
useful if you want ordered sets. You can define
the ES_ES collation sequence on any character
fields that you will want to be available as
ordering criteria. The simplest way to do this
is to define domains that include the required COLLATE attribute.

>How can I rebuild the old one with the correct settings. Using gbak maybe?

No, you will need to reconstruct the database from scratch.

>Finally, any idea on why the duplicate values appeared ?

Yes. My prime suspect would be that varchar(100)
column and the likelihood that the database has
stored characters that your utility programs cannot display distinctly.

After that comes the (more remote) possibility
that you were using a database that had been
partially restored, i.e. that had restored all of
the data but had been unable, finally, to
activate the unique index due to some faulty
data. The data fault itself might have arisen
from some character set problem. In that case,
of course, the unique constraint would not be
operational and there would be nothing to prevent duplicate keys.

If you still have the old database, you could
check the state of that index using gstat
-i. Amongst other things, it will tell you whether that index was inactive.

Every driver has its own peculiar way of managing
the language code interface provided by the
API. Considering that you are manufacturing that
varchar index key in your .NET application, it
won't do any harm to ask on the
firebird-net-provider list about any known
transliteration problems with that driver when
attempting to plug non-ascii strings into an
ascii database. An added bonus is that its
developer (Carlos) is a native speaker of
Spanish. I doubt there is anyone in the Firebird
community better equipped to help in your particular case.

./heLen