Subject Re: [firebird-support] FB 3 issues with String from FB 2.54
Author Mark Rotteveel
On 23-5-2016 03:45, fabianch@... [firebird-support] wrote:
> I have been trying to migrate from FB2.54 into FB 3 for a few weeks, and
> after hitting a string related error for some time i have got to the
> point where I do understand the issue, but I don't know how to solve it.
> The issue is pretty simple, the FB 2.54 DB contains a few characters
> that are not allowed into the FB 3 database, one example of a character
> causing an error during the restore was "Mcgarrity’s" (note the ’) as it
> appears to be outside the scope of the FB3 string domain, I have trying
> creating a new FB3 DB with many different charsets but none works. The
> other string causing issues is for example "΢ÈíÑźÚ", I have many
> records with this type of strings because the DB contains raw emails
> received by the system, stored into Varchars, and apparently some emails
> contain very weird characters, all were handled by FB2.54 but FB3
> rejects the records. I have been able to isolate all recrods with issues
> using IBExpert's table data comparer function, as it created a script
> with all recrods from all tables from FB2.54 and when running the script
> against FB3.0 it singles out all the offending records.
>
> Can anyone advise what options I have available to force FB3.0 to accept
> any stuff into string fields?

In your other e-mail you indicate you solved this by changing the
character set from ASCII to NONE. The fact it worked before was a bug,
see http://tracker.firebirdsql.org/browse/CORE-3416. ASCII only supports
characters 0-127, characters outside that range are 'extended ascii', eg
one of the other singly by character sets like WIN1252 or ISO8859_1. The
characters shown (΢ÈíÑÅºÚ and ’) are all outside the ASCII range.

The last (’) is particularly nasty, because it should have been a '
(u+0027 Apostrophe, ascii 39), instead u+2019 Right single quotation
mark (character 146 in Windows-1252) was used.

Given the context of e-mails either NONE or OCTETS is the only real
option, as e-mails can have multiple parts with each their own character
set, and can also have binary parts (although usually those are encoded
with something like base64).

Mark
--
Mark Rotteveel