Subject Re: [] Re: Dialect problem??
Author Helen Borrie
At 05:26 PM 5/02/2004 +0000, you wrote:

>Now here is where I'm a little confused. In the docs (somewhere) I
>read that in blobs (subtype 1) the character set *is* taken in to
>consideration.

Yes, it is, for searches (LIKE, STARTING WITH, CONTAINING).

>So am I wrong in my assumption that if the blob and my
>varchar have the same character set id there shouldn't be this
>transliterate error occuring when I try to update one to the other?

As I understand it, no. When the blob comes and goes as a blob, the
database doesn't read what's in the byte stream. But what you are doing is
taking this stream of bytes and converting it to a string (for some
unfathomable reason!!) So the illegal characters are showing up when you
present it as a varchar.

Now, there are two unknowns here:
First, I don't see that you've mentioned anywhere that you have ASCII set
up as the client-connection character set. If the client is character set
NONE and the database is ASCII (you can check this by viewing RDB$DATABASE)
then you have a client-database mismatch. If the transliteration of a
particular character is valid, the engine can perform it automatically,
provided it knows the character set of the incoming data.

Secondly, I *believe* the ASCII character set is limited to characters from
ascii 32 through 126 - the characters you can type on a US ASCII keyboard
without using an ALT-sequence. I'm sure I've seen this stated in a list
somewhere. Having bumped into this before with ASCII, I'm pretty certain
there is no documentation around that can help sort it out, but you can
write your own test for it quite easily.

You say that the blob data came through by being copy-pasted from Word into
Access...if I understood that correctly, then the Access data would respect
the special Windows extended character mappings (that you can access in
Word via "Insert..Symbol").

Now, here's an interesting test. In Word, I created this document and
saved it first as "Text with Line Breaks" (which is ANSI text, Windows
"native" text format). I used Insert..Symbol to get the ascii 146 (AE
diphthong) in there. Here's what was saved:

This is a test.

Æ Æ Æ Æ

end of test.

(my mail client current displays the AE diphthong correctly...)

Next, I saved it as "MS-DOS text with Line Breaks" (US ASCII text) and
here's what was saved:

This is a test.

’ ’ ’ ’

end of test.

The next step with this would be to create a database with default
character set ASCII, connect to it with a client also set to ASCII and see
where you get to with these two strings. The ASCII substitute string looks
like an apostrophe (ascii 39) so my guess is that it would trip over on the
first non-escaped apostrophe character.

This is interesting from an *academic* point of view but I do wonder why
you even consider storing a 32Kb varchar in your database...

/helen

/hb