Subject | Re: [firebird-support] Upgrade Firebird 1.5.3 to 2.5.4 - Character Set and Collations |
---|---|
Author | Mark Rotteveel |
Post date | 2015-08-04T19:01:01Z |
On 31-7-2015 16:09, 'stwizard' stwizard@... [firebird-support] wrote:
were characters are encoded in 1-4 (technically up to 6) bytes, where
some byte combinations are simply not valid.
Your metadata has some bytes that are and higher than 127, and have
combinations that are not valid in UTF-8.
character set to character set unicodefss. This is also why you should
only do it **once**, as repeatedly apply this conversion for characters
outside the ASCII range will lead to conversion upon conversion.
If your original metadata was generated on a Western European or United
States/Canada system then yes (both ISO8859_1 and WIN1252 would be a
good choice), however if your company is located in other parts of the
world, then you'd need to select the character set that is valid for
that region.
For example byte 0xC4 (196) is Д in windows-1251 (which is 0xd094 in
UTF-8), and Ä in windows-1252 (which is 0xC384 in UTF-8), using the
wrong metadata character set would lead to 'corruption' of the metadata.
better to use a specific character set. If your application currently
works fine, then there is not much reason to change, however you might
run into problems when switching to systems or locales that have a
different default character set. Just changing your database to a
specific character set without carefully checking your software for
assumptions based on NONE, and testing your software is not wise.
the old database to the new.
answer.
Mark
--
Mark Rotteveel
> Greetings All,ASCII only allows byte values 0-127, while UTF-8 is a specific encoding
>
> I recently did a metadata only backup of my Firebird 1.5.3 database.
>
> Please note this database has a character set of NONE and anywhere where
> this could be set has remained NONE and no Collation was set either.
>
> NOTE: We DO NOT use BLOBs we use VarChar for our memo fields.
>
> On a new computer I installed Firebird 2.5.4 and all required external UDFs.
>
> I then attempted to do a metadata only restore using the NONE Character
> Set as follows:
>
> gbak -r -p 8192 -v -fix_fss_d NONE -fix_fss_m NONE -user SYSDBA -pas
> masterkey "C:\Frontline\CollectingIT\Database\COLLECTINGIT.bak"
> "C:\Frontline\CollectingIT\Database\CollectingIT_METADATA_RESTORE.fdb"
>
> and it failed with this error:
>
> gbak:restoring table CR_COMPLIANCE_CONDITION
>
> Error: Invalid metadata detected. Use -FIX_FSS_METADATA option.
>
> Malformed string
>
> Exiting before completion due to errors
>
> This I tried using ASCII for the character set and it failed with this
> error:
>
> gbak:restoring stored procedure SPS_CLEAR_BKRCY_ON_HOLD_STATUS
>
> Error: Cannot transliterate character between character sets
>
> gds_$put_segment failed
>
> Exiting before completion due to errors
>
> Then I tried UTF8 and receive the same error as above.
>
> Then I tried ISO8859_1 which was my last hope and it worked. Not sure
> why the other three failed.
were characters are encoded in 1-4 (technically up to 6) bytes, where
some byte combinations are simply not valid.
Your metadata has some bytes that are and higher than 127, and have
combinations that are not valid in UTF-8.
> Questions:The -fix_fss_m will convert the bytes in the metadata from the specified
>
> 1I assume this is only changing the character set for the PSQL (Stored
> Procedures, Triggers, etc.) to Firebird v2.5.4 standards is that right
> and if so is it OK to use this character set?
character set to character set unicodefss. This is also why you should
only do it **once**, as repeatedly apply this conversion for characters
outside the ASCII range will lead to conversion upon conversion.
If your original metadata was generated on a Western European or United
States/Canada system then yes (both ISO8859_1 and WIN1252 would be a
good choice), however if your company is located in other parts of the
world, then you'd need to select the character set that is valid for
that region.
For example byte 0xC4 (196) is Д in windows-1251 (which is 0xd094 in
UTF-8), and Ä in windows-1252 (which is 0xC384 in UTF-8), using the
wrong metadata character set would lead to 'corruption' of the metadata.
> 2We are located in the central part of the U.S. and need English onlyThat is a separate question. NONE has its uses, but in general it is
> (or only characters that can be entered on a standard U.S. keyboard), so
> is it OK to keep the character set as NONE?
better to use a specific character set. If your application currently
works fine, then there is not much reason to change, however you might
run into problems when switching to systems or locales that have a
different default character set. Just changing your database to a
specific character set without carefully checking your software for
assumptions based on NONE, and testing your software is not wise.
> a.Would it be better to change it to another character set and if soSee above
> which one would be best for English? If changing to another character
> set I need to be sure it will not change any existing data, nor affect
> the length of data that can be stored in the field.
> b.If it is recommended to change is there a way to update the entireCreate new metadata with correct character set and pump the data from
> database at once?
the old database to the new.
> Then this brings up the Collation setting on what is best to use withThe collation depends on your requirements; there is no single correct
> the character set chosen and how to change that for the entire database
> at once.
answer.
Mark
--
Mark Rotteveel