firebird-support - Re: [firebird-support] Re: Error loading DB using FB 2.1.3

Subject	Re: [firebird-support] Re: Error loading DB using FB 2.1.3
Author	Steve Boyd
Post date	2010-08-10T19:45:34Z

> Trust me, the feeling you get when the database recovery comes up with
> "file not readable" is not a good one!
>
>

Been there, had that feeling. That's why I now have an entire chest
full of tools that do partial backups from corrupted databases and
reloads into a new one. But there is a difference between unreadable
media and a complete backup that can't be reloaded because of some
piddly data consistency error.

>
>> We often run into situations where
>> duplicate keys that don't exist in the original database's index cause a
>> restore to fail.
>>
> That's interesting. Are you saying that a restore, of a database with no
> duplicates causes duplicates? I've had that before - on Oracle - when
> the database language was incorrect and German characters with accents
> were losing the accents and becoming unaccented - and that caused a
> duplicate.
>
> However, the restore testing that we did, showed us where we had a
> problem that needed to be fixed.
>
>

Yes. A restore of a database of plain English ASCII text with no
duplicates can occasionally cause duplicate key errors on the restore.

>
>> Somehow, records that can be selected using a NATURAL
>> plan do not exist in the primary index.
>>
> Again, interesting. Is your "primary Index" actually a Primary Key
> Constraint? Or a Unique Index (or Unique Constraint?) because those
> Unique indexes/constraints do allow "duplicates" - provided all columns
> in the index are NULL - because those records are not actually indexed.
>
>

The primary index is a primary key constraint. Therefore all columns
are NOT NULL. There is apparently some extremely intermittent bug in
Firebird. I am not the only person to report this. I have gone a few
rounds with the FB development team but the problem is not reproducible
on demand and so we really got nowhere trying to track it down. It
seems to be related to deleting a key in one transaction at the same
time as it is being re-added in another.

>
>> Testing every backup for
>> restorability is not exactly practical in the real world where I have
>> about 100 servers with a couple of hundred databases to support.
>>
> I agree. But some backups need to be tested - create a clone of your
> database with a "gbak -create ..." for example. Run some automated
> testing scripts or whatver.
>
>
>

True, but we could test Monday's back up on Tuesday, have an
inconsistency creep in on Tuesday and still be screwed on Wednesday. It
is really no win. All you end up doing in limiting the risk of failure,
not eliminating it. It becomes a trade off between how much time you
can spare from your normal daily activities to do this kind of testing
and the amount of grief you are willing to put up with if the restore fails.

>
>> Telling me to test every back up seems like documenting bad behaviour
>> rather than fixing it.
>>
> Possibly. If I create a successful backup with gbak or nbackup, all well
> and good - maybe nothing actually went wrong with the backup. However,
> what if something happened during the actual write to the device?
> Windows is known for not quite flushing all data to the disc from time
> to time, so it's possible that gbak/nbackup have completed a backup, but
> the underlyinginfrastructure has put the boot in and rendered the file
> unusable.
>

Well, I did say "excluding unreadable media". There are some things
that I don't expect gbak to be able to recover from. There is no magic
after all. But simple data integrity errors should not render an entire
back up unusable. Warn me, tell me the primary key values, drop the
record if you must and carry on. Getting 99% of the data back and being
back in operation in a couple hours is far better (from my point of
view) than getting 0% of the data and being down for days. Except,
perhaps, if you're a bank but even then something has got to be better
than nothing.

> That's a situation I'd like to know about, personally, rather than
> finding out when I really need *that* particular backup.
>

Again, the only way to know that a particular backup is good is to test
every backup. I've been burned often enough by bad tapes that I always
read a tape back after I write it but even that is no guarantee.

> I have Oracle databases in the terabyte range that I still have to test
> that the backups are usable. Now that's a boring task of a Monday
> morning! ;-)
>
>

If I had to do even 100GB restores on most of our servers I would be
talking about days, not Monday morning. :(

[Non-text portions of this message have been removed]