Subject Re: [firebird-support] Special Character Problems (Trademark Symbol)
Author Helen Borrie
At 06:32 PM 30/08/2003 +0000, you wrote:
>I'm having a problem with getting data out of an IB db using Free
>IBProvider odbc driver.
>I'm trying to pull the data using ADO, to push the data into an
>Access db.
>The problem is, some of the IB data contains the trademark symbol
>(™), and it's causing an error.
>arithmetic exception, numeric overflow, or string truncation
>Cannot transliterate character between character sets
>Here's the sample text:
>When I view the data through IBConsole, it doesn't appear properly,
>it doesn't show the ™ symbol, in either the tableview, or
>Interactive SQL, but when I copy and paste that text into anything
>in Windows XP, it shows the text properly.
>The character set for the database seems to be default.
> >From IBConsole, View Metadata:
>/* CREATE DATABASE 'C:\Temp\condata.dat' PAGE_SIZE 4096
>I've tried a few character types in the connection string
> IBdbSource = 'C:\Temp\condata.dat'
>adoIBConn.ConnectionString = "provider=LCPI.IBProvider;data
>source=localhost:" & _
> IBdbSource
>& ";ctype=win1251;" & _
> "user
>I've tried
>I'm also not getting the text out of blob fields properly, not sure
>if this is an indicator of a larger issue or not.
>Any help would be greatly appreciated.

If you like, it's an indicator of some larger issues.

First, Access is a Windows-only desktop application with a proprietary,
dedicated user interface. It has access to Microsoft's proprietary 4-digit
mappings for its own sets of character-like symbols. To taste for yourself
what's playing here, create a Word file using the trademark symbol and save
it as text (not MS-DOS text). Then open it with Notepad. In the course of
translating the characters to ascii, Word replaces the trademark symbol
with the characters "(tm)".

You can only view Access data from its native interface (or, in a
third-party Windows application that has the ability to read the
Access-native format and correctly handle these 4-digit mappings).

There's hope if you can paste stored text containing this symbol and/or get
a transliteration error when trying to read it. It means something is
being stored.

However, the RDBMS doesn't have these interface capabilities. It is a
Windows application, in that it is compiled to run on Windows. The same
software is also compiled on many other platforms. It stores data that are
compatible across all platforms - unlike Access, it is not a "for Windows"

It will store whatever you give it but, if it encounters characters in a
column that are not included the character set defined for that column, you
get the transliteration error when the server tries to read it.

The default charset, if you don't define one, is NONE. To be able to
retrieve characters that are not in the base range of US ascii (1 to 254)
you need to define that column as a charset that does contain the chars you
want. You can store non-printing characters in that range, by their ascii
codes, but this doesn't help you with these specially-mapped Windows symbols.

Now, assuming you can find a charset that supports the 4-digit extended
characters - one of the WIN sets could be worth trying - you need to define
that column to use (store and retrieve) characters from that set. You can
include that definition when defining the column, or a domain which that
column uses, or at the database level. To complete the loop, you have to
have that character set available in your client software.

(I can't help you regarding which, if any, of the IB charsets supports the
4-digit symbols...)

It's no good trying to force a character set by setting the client
parameter alone. They have to support each other. The lc_ctype parameter
in the connection provides a convenient way to put the database, the client
and the application in a closed loop for character set handling, when the
character set has been defined at database level.

If all of your data needs to assume the presence of extended characters
then, provided you find a suitable character set, this would be the right
way for you to go. Your character and text blob columns will be stored and
retrieved using that charset. If you go the way of defining character set
column-by-column, life gets a lot more complicated.

Make sure you define text blobs explicitly. Just "BLOB" will get you a
sub_type 0 blob definition which won't necessarily play nice when you try
to perform text operations on it.