Subject Re: [firebird-support] Converting of an ISO8859_1 database to UTF8
Author Pierre Y.
Not sure but FBClone (via
http://code.google.com/p/fbclone/wiki/ConvertBadEncodedFirebirdDatabases)
could help.

I embedded its code into my application to provide automagic ISO8859_1 to
UTF8 conversion for my application (including BLOB SUBTYPE 1 containg XML
data)


On Thu, Nov 1, 2012 at 8:43 AM, patrick_marten <patrick_marten@...>wrote:

> **
>
>
> Hello,
>
> I'm working on a new release of my application, in which some bigger
> changes are supposed to happen. I need some assistance please :)
>
> Sorry, it's a bit long, but it's about a sensitive matter, so I wanted to
> provide as many details as possible.
>
> 1. The current version
> -----------------------
> 1.1 It's using Firebird 2.1.3.18185 to access a database with ISO8859_1
> character set.
> 1.2 Several tables of the database have fields declared as follows:
> FIELDX BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET ISO8859_5,
> FIELDY VARCHAR( 100) CHARACTER SET ISO8859_5,
> Those fields weren't used so far, because IBO didn't support unicode when
> this version was created, but they do play a role below, so I'm mentioning
> them.
> 1.3 Several tables contain BLOB SUB_TYPE 1 fields, which store RTF-text in
> them
>
> 2. The new version
> -------------------
> 2.1 Although it's not neccessary, I was planning to use the latest
> Firebird version - currently 2.5.1.26351
> 2.2 Since IBO supports unicode now, my application is supposed to support
> it as well, so that the "new" database will have UTF8 as default character
> set
> 2.3 Fields mentioned in 1.2 will no longer have to have a special
> character set - everything UTF8
> 2.4 Fields mentioned in 1.3 have to become BLOB SUB_TYPE 0 fields, as the
> content will be stored in a different way in the new version. The content
> must not get lost of course
> 2.5 A lot of structure changes will come with the new version too (new
> tables, fields etc.)
>
> While all the changes are not a problem for new customers, they pretty
> much are a big issue for existing customers, as I will have to "convert"
> their existing databases, once they install the new version. So the
> requirements from point 2 have to happen somehow on their machines.
>
> 3. My ideas / concerns so far
> ------------------------------
> 3.1 Using the script "convert2utf8.sql" from IBExpert site (
> http://www.ibexpert.com/download/character_set_conversion/)
> I've tried it, but am not very happy with the result:
> 3.1.1 Problem #1 is: the fields mentioned in 1.2 remain as they are, i.e.
> ISO8859_5 in that case. The script probably has to be executed several
> times - each time with a different source_characterset, i.e.
> source_characterset=ISO8859_1, source_characterset=ISO8859_5, ...
> 3.1.2 Problem #2 is: although all other columns seem to have UTF8 as
> character set, the database itself still seems to have ISO8859_1 as
> character set (showed in IBExpert)
> 3.1.3 Problem #3 is: after my attempt to convert an existing database it
> seems to miss 5 tables, 6 views, 58 indicies and to have 3 triggers more
> than in the source database...
> 3.1.4 Structure changes (2.4 and 2.5) would have to happen additionally
> afterwards
> All in one I'm not very convinced by this option...
>
> 3.2 Using a component like TIB_DataPump
> I don't have any experience with it and don't know, if that's an option to
> accomplish what I need. The thoughts so far:
> 3.2.1 I probably would have to make an UTF8 database first, which has the
> same structure as the source database - just different character sets
> 3.2.2 One would be accessed by a connection with ISO8859_1 as character
> set, one by a connection with UTF8 as character set
> 3.2.3 However the component would pump the data, it would get converted to
> UTF8 that way (I hope)
> 3.2.4 Structure changes (2.4 and 2.5) would have to happen additionally
> afterwards
>
> 3.3 Doing everything manually
> Means probably the most work, but appears to be the safest way at the
> moment.
> 3.3.1 I could start with the final UTF8 database, i.e. with all the
> structure changes etc. included
> 3.3.2 The source database would be accessed by a connection with ISO8859_1
> as character set, the UTF8 one by a connection with UTF8 as character set
> 3.3.3 Going through all the tables and "pump" the data manually into the
> new database, whereby the conversion to UTF8 would happen automatically (I
> hope). Good thing would be, that I could do the steps like 2.4 at the same
> time
>
> 3.4 Some other / better way(s) to accomplish what I need, you could think
> of
>
> The new version will be installed into a new folder, so that there will be
> files of two different firebird versions on the machine, until the old
> version gets uninstalled. So I also would need to know when and how the
> jump to Firebird 2.5.1 should be done, especially considering the note I
> found in one of the PDF files: "IMPORTANT: Before upgrading please pay
> attention to the files in the /misc/upgrade/metadata directory of your
> Firebird 2.1 installation as an essential step for preparing your databases
> for use under Firebird 2.1. If your databases contain metadata fields (like
> object names, comments/descriptions, computed field definitions, PSQL, view
> or constraint sources) storing non-ASCII data then the backup/restore
> upgrade cycle may not be enough to make them usable." This one sounds as if
> it's not relevant for me, as my databases are already used under Firebird
> 2.1, but I also believe having read, that about the same applies to blob
> fields and a change from Firebird 2.1 to Firebird 2.5.1. Maybe I mix up
> things here though...
>
> Regarding 3.1.4 and 3.2.4: I've found this example on how to change the
> blob type:
> UPDATE RDB$FIELDS SET RDB$FIELD_SUB_TYPE = 0
> WHERE RDB$FIELD_NAME=(SELECT RDB$FIELD_SOURCE FROM RDB$RELATION_FIELDS
> WHERE RDB$FIELD_NAME='SOME_TEXT_BLOB_FIELD' AND
> RDB$RELATION_NAME='SOME_TABLE');
> Seems to work without the content getting lost. Sure, it's in the wrong
> format that way, but it seems to be possible to obtaint the content and
> then to save it in the new format. But depending on the choice of the way
> to go, this could be irrelevant.
>
> All in one I would like to ask you for some assistance and an as detailed
> instruction as possible on how to proceed to accomplish the requirements
> from point 2, as this is a sensitive step and must not make any mistakes :)
>
> Many thanks and kind regards,
> Patrick
>
>
>


[Non-text portions of this message have been removed]