Subject Re: [firebird-support] Strange restore problem in embedded firebird
Author Dave Hughes
Hi Helen,

Thanks for the quick response! Unfortunately, I can say for certain
that your theory that I'm not actually connecting to the embedded
server can't be the cause here. In trying to keep my note as brief as
possible, I think I left out too much information. I'll try and correct
that below...

On Sun, 11 Dec 2005 10:26:49 +1100
Helen Borrie <helebor@...> scribbled:

> At 06:45 PM 10/12/2005 +0000, you wrote:
> >Hi,
> >
> >For a few months I've been working on an application using the
> >Embedded Firebird database engine (currently using the latest 1.5
> >release, 1.5.2.4731). However, I've been encountering a strange
> >problem when restoring databases from a backup. I've managed to
> >workaround the problem, but it's a horrible method which I suspect
> >is bound to fail at some point, and I'd really like to find out how
> >to fix it properly...
> >
> >Without going into too much boring detail, a "document" in this
> >application is essentially a *backup* of a Firebird database (the
> >application needs to use ALTER TABLE quite a bit, so using backups
> >in this way mitigates the problem of a database freezing after a
> >table has been altered 200 or so times). Hence, the usual operations
> >in the File menu are implemented as follows:
> >
> >
> >New
> >===
> >1. Use isc_dsql_execute_immediate to execute a CREATE DATABASE
> >statement, which creates a new firebird database in the system's TEMP
> >directory
> >2. Open a connection to the database and execute some SQL to
> >create various tables and views in the database that the application
> >needs
> >
> >Save
> >====
> >1. Close the connection to the database
> >2. Use isc_service_attach to attach to the local service manager
> >3. Use isc_service_start with action isc_action_svc_backup to backup
> >the database from the system's TEMP directory to whatever file the
> >user specified in a save dialog
> >4. Use isc_service_detach to detach from the local service manager
> >5. Reopen the connection to the database
> >
> >Close
> >=====
> >1. Close the connection to the database
> >2. Delete the database from the TEMP directory
> >
> >Open
> >====
> >1. Use isc_service attach to attach to the local service manager
> >2. Use isc_service_start with action isc_action_svc_restore to
> >restore the file specified by the user in an open dialog to a
> >database in the system's TEMP directory
> >3. Use isc_service_detach to detach from the local service manager
> >4. SLEEP...
> >5. Open a connection to the database
> >
> >
> >The problem is step 4 in the Open operation. If, after restoring the
> >database from the user-specified file I *immediately* open a
> >connection to it and start running queries I always wind up with one
> >of two errors:
> >
> >1. Either the Firebird engine throws an error stating that the
> >database was truncated and it hit the end of the file before it
> >expected to 2. Or, the first query to get run fails complaining that
> >the view or table it's trying to access doesn't exist
> >
> >However, if I have the application sleep for at least a second after
> >restoring the database, but before opening a connection and querying
> >things then it works perfectly! This feels like a race condition to
> >me, as though the restore operation is happening in a background
> >thread and hasn't really finished by the time isc_service_detach
> >returns.
> >
> >So, are there any steps I'm missing? Should I be waiting on something
> >before trying to open a connection to the newly restored database? Or
> >is it possibly a bug in the embedded engine itself?
> >
> >For reference, I'm writing the application in Delphi, and using the
> >ZeosDBO components (zeoslib.sourceforge.net) for connecting to the
> >embedded Firebird database engine (as well as other remote
> >databases). Any suggestions gratefully accepted!
>
> Without going into a step-by-step breakdown, may I suggest that
> what's happening here is not what you think is happening? From your
> description, you *think* you are doing all this stuff using the
> embedded client/server...

Actually, the embedded engine is the only thing the application could
be connecting to because it's the only Firebird engine available.
Specifically, the machine I'm developing on has no other Firebird or
Interbase server (or client) installed on it (I was careful not to
install Interbase when installing Delphi just in case it might
conflict with the embedded engine).

In other words, it's got the content of the Firebird embedded archive
(Firebird-1.5.2.4731_embed_win32.zip) and nothing else.

> However, whenever your application (of gbak) is connecting to
> localhost (or any other "remote'" server (as it must to use
> service_mgr) the client inside the fbembed library looks for a
> running server at that remote location (which will be the local
> machine if it's localhost). Why? Because the embedded server can't
> accept remote connections *at all*.
>
> It looks to me as though, at one point or another, the version of
> gbak you are running is not the one that matches the database's
> structure. Also, depending on where the application call lands, it
> might not even be using the client library that you think it is,
> since service_mgr attaches through the client that is local to it.

In fact I'm not running gbak to backup or restore the database at all.
As mentioned above, the only Firebird engine on the machine is the
embedded engine, which doesn't come with gbak or the other utilities
distributed with the full server. The steps in the file related
operations described in the original note (New, Save, Open, Close)
include the calls to functions in the embedded engine that call the
backup and restore capabilities of the services API
(isc_service_attach, isc_service_start, etc).

I suspect, though I admit that I haven't looked at the code, that
this is what gbak itself does (i.e. that gbak itself doesn't backup the
database, rather it attaches to the services API and asks it to backup
or restore the database).

To implement the backup / restore code in my application, I looked at
the implementation of the TBackupService and TRestoreService components
(from the IBX package that ships with Delphi 6), figured out the bare
minimum that was needed to run a backup and restore via the services
API without calling any external applications, and implemented that in
my application.

Studying how gbak actually works is something I'm trying to do at the
moment ... unfortunately my C++ knowledge is proving rather inadequate
to the task!

> The fact that you are actually getting backup files and restores at
> all by the described steps indicates that, at the very least, the
> service_mgr call from the embedded library is actually finding a
> running server. But because you are getting only partial restores,
> the obvious conclusion is that the restore bombs out and your
> application isn't intercepting that fact - it's only concluding that,
> because the call to gbak returned, it must have finished all the way
> to the end.

Well, as established, the restore can't be bombing out because it's
occuring from within my application (rather than an external
application like gbak). If it bombed out, it'd be taking my
application with it.

But here's the odd thing: I only get partial restores if I don't make
the application sleep between the restore operation and opening a
connection to the database. Because the backup and restore calls are
implemented in my application (without calling an external
application), I can be sure that the restore operation, and the
subsequent opening of the connection should be occurring in the same
thread of execution. At least, my code certainly executes these
operations in one thread -- whether the service manager actually
executes the restore operation in that thread or a background thread of
its own making I couldn't say at the moment, but I can't see how else a
race condition could occur (assuming, as I suspect, that this is a race
condition)?

> We can definitely conclude that the restore DID finish, at some point
> before the end, since the embedded client - requiring, as it does ,an
> exclusive lock - would not be able to connect to the restored
> database at all if gbak were still connected to it.
>
> This message does not in any way endorse the concept that designing
> an application such that it's allowed to modify a table 200+ times is
> in any way as an intelligent design strategy....or try to comprehend
> why you need to even try to perform service operations through the
> embedded server when you have a full server running on the same
> machine....

On the point of the design strategy, I agree completely: designing an
application to deliberately run lots of ALTER TABLE statements would be
silly. That's not actually what my application does, but it's something
that *could* occur over time assuming a particular pattern of user
behaviour (a pattern which wouldn't be unreasonable from an end user's
perspective).

Here's an attempt at a brief explanation (not my forte as we've
discovered!): part of the application allows users to run queries
against remote databases (Oracle, DB2, MySQL, whatever). The results of
such a query are stored in a table in the embedded firebird database.
There are several modes in which such queries can be executed. One of
these modes allows the query to *update* the results from a prior run
of the query, instead of replacing them. In this mode, if the query
returns a differently structured result set (either because the query
has changed or the source database has changed) the application makes a
"best effort" attempt at preserving the results of the prior run by
running ALTER TABLE statements to remove columns which have disappeared
from the result set, add columns that have appeared, and change the
data-types of columns as necessary.

Although unlikely that a user would change a query 200+ times in a
single run of the application, it is conceivable that the user might
alter a query 200+ times over the course of many weeks or months.
Without the backup / restore between runs of the application, this would
leave the user with an inoperative database.

This is one of the reasons I chose to implement the documents of the
application as *backups* of a firebird database rather than just a
firebird database itself (there are other reasons, and indeed other
patterns of behaviour that could lead to 200+ ALTER TABLE statements
being executed over a period of time, but they're either less
important or less likely).

Anyway, thanks again for the quick response. Sorry my original note
wasn't clear enough; hopefully I've cleared up any confusion in this
one but please ask if you need any more details.

Thanks,

Dave.