Subject Re: Handling large imports while system is in production
Author rudibrazil78
--- In firebird-support@yahoogroups.com, "karolbieniaszewski" wrote:
>
>
>
> --- In firebird-support@yahoogroups.com, "un_spoken" wrote:
> >
> >
> >
> > --- In firebird-support@yahoogroups.com, "rudibrazil78" wrote:
> > >
> > > Hi all
> > >
> > > We've been having to deal with a bad situation for quite a few years now - how to handle large imports without taking the server down (in fact that data is meant to be acessible as soon as its ready in production).
> >
> > We have the same problem. We are importing daily like 1000000 rows. There are 100000 new rows each day and the rest are the rows which are updating the ones which are already in the database. Import takes about an hour for 100k rows. During the import many things can go wrong..
> >
> > I would like to hear your strategies for this sort of problem:)
> >
>
>
> Hi,
>
> i import 300 000 000 rows each day into my db by merge statement
> and i use for import external tables.
>
> Whole import took 2 hours - and performance degradation is acceptable.
> With 1000000 rows and external table you can finish in few minutes e.g. 3 minutes. External table is solution for all imports.
>
> regards,
> Karol Bieniaszewski
>

Karol, quick question about importing from external tables (if you are able to answer).

The application Im working this issue on is run by the end client alone, it would be impossible for us to manually help them out moving files or validating data.

Its very common that some records on the client's side are unusable / invalid. like chars on int fields, etc.
Since today we process each insert, we can return error only on bad lines and let the import continue on good lines.
This is critical because the quality of the import files varies widely and we do not provide data sanitization services to our clients, we do therefore return the bad rows.

So Im considering adapting my external tables solution with a for block, so I can process each insert to.

Something like this (just an example of the general idea) :

for
select field1, field2 from external_table into :f1, :f2
do begin

insert into prod_table (field1, field2) values (:f1, :f2);
when any do begin
--singleton error handling code goes here,
--wont break whole import
end
end


What Im wondering is, if processing records with a for loop will be too much slower then inserting them all at once with a simple "insert into() from select"