Subject Re: [ib-support] Importing data into Firebird
Author Helen Borrie
At 05:32 AM 29/01/2003 +0000, you wrote:
>Hi all,
>
>I have the following problem:
>- We are in progress of migrating large database tables - millions
>of records per table - from an old ISAM based DBMS
>(CTOS OS - anybody else still using it? ;-) ) to Firebird 1.0.
>The raw data was successfully imported into Firebird using
>external tables.

That's a good start. :-)

>This is OK.
>Now we have to perform extensive data manipulations (restructurings,
>normalize-denormalize, conversions, translations ...), because the
>new database will have a completely different structure than
>the old one.
>
>The questions:
>- Is it possible somehow to keep track of progress of long-running
>update batches and stored procedures from ISQL or other
>DB admin tool? I mean, to display the number of processed records.

Yes, if you design your process so that it breaks back to the client from
time to time and commits records in batches of about 30,000; also so that
it logs things like exceptions. Alternatively - or in combination with
your breaks and logging - you can use events. All in all, to get feedback
from server to client you have to provide the means to feed something back...

Much more info would be needed about your environment before it would be
useful to start suggesting how you achieve these things.

>Currently I am able to watch only the hour-glass cursor on my
>screen. Some of the batches take hours, and I have no insight into
>their progress.
>- Is it possible to cancel the long-running batches?

What do you mean by "batches"? You cannot stop an SQL query...hence, one
of several good reasons to have the selection of source data under good
control with the aforementioned mechanisms.

>(if an update batch does not finish in a reasonably long time
>interval, then it's likely that something went wrong ...).

You need to intercept the problem immediately it happens. You can use
exceptions to handle problems - it is up to you whether you "eat" an
exception by simply bypassing it, or by logging it, or have it terminate
the procedure by breaking out of it (the as-designed behaviour of the
inbuilt exception handling). In the latter case, again, it is up to you
whether you deal with the problem data from the client before resuming; or
just bypass the problem, commit what has succeeded so far, and
resume. (Resuming assumes your procedure is passing back, in output
parameters, suitable key values on breaks to provide inputs for resuming
where you left off before.

>If I stop the Firebird server while there is a running update
>process, then I suppose that it could lead to data corruption.

In a best-case scenario, stopping the server doesn't corrupt
data. However, you will have an unresolved transaction to deal with when
you restart and, since you don't know what the conditions were when you
stopped the server, any unresolved updates will be irretrievable. And you
will have the task at some time of running gfix or a backup and restore to
remove the lost transaction and its uncommitted record versions.

In short, stopping the server isn't a friendly way to control a
long-running query. On a Windows server, it has a habit of being risky,
especially if you have ill-advisedly set synchronised disk writes off. A
well-controlled process is much to be preferred.

>- Is there a well-tried practice to carry out such data migration
>tasks?

I've done lots of data migrations and the single thing they all had in
common was that they had nothing in common - a slight exaggeration, I admit
- but certainly every migration task has its own peculiarities. I always
like migrations where it's possible to get ALL of the legacy data into
external tables as type char(). From that point, they go into transitional
tables in a Firebird database, where I don't touch the imported data at
all, but simply add the keys I want to use for batching and
converting. These keys are generally just unique row identifiers for each
table. I *never* do any of the "final" stuff, like constraints or
indexing, on transitional tables.

At that point, I'll perform a backup and restore on the transitional
database and copy the restored version transitional database to a CD. That
gives me a clean, complete reference to the legacy database's data, which I
can use and reuse throughout the migration cycle, first to extract test
data and, when I'm ready, to be the source for the real data pump. It also
means that I can work on the real database schema at a very abstract level
and be alerted to many of the potential conversion problems well before I
begin the actual pumping.

heLen