Subject Re: [firebird-support] import data from external table (huge file)
Author Helen Borrie
At 11:43 AM 24/08/2005 +0000, you wrote:
>Hi all!
>
> I'm trying to import the data from a 2.6 Gb fixed length format
>external file with 35 million records into a regular table inside the
>database. Each line in the external table file is 76 characters long.
>I'm using a stored procedure to do the import, because I need to
>detect and store duplicates from the file in another table (violation
>of table primary key) and continue with the processing
>
>I execute the SP (EXECUTE PROCEDURE INSERT_EXTERNAL_TABLE) with EMS
>QuickDesk and after 2 hour processing (the bdd file grows until it
>gets 5 Gb size) I get the following error:
>
>
>EXECUTE PROCEDURE INSERT_EXTERNAL_TABLE
>fmSQLScript.Script:
>Unsuccessful execution caused by a system error that precludes
>successful execution of subsequent statements.
>I/O error for file "D:\SRC\INSERTFILES\DATA.DAT".
>Error while trying to open file.
>El dispositivo no reconoce el comando.
>.
>
>I'm using Firebird Firebird-1.5.2.4731_win32
>
>This is the stored procedure
>
>CREATE PROCEDURE INSERT_EXTERNAL_TABLE
>AS
> DECLARE VARIABLE A VARCHAR(13);
> DECLARE VARIABLE B VARCHAR(10);
> DECLARE VARIABLE C VARCHAR(9);
> DECLARE VARIABLE D VARCHAR(10);
> DECLARE VARIABLE E VARCHAR(20);
>
>BEGIN
> /* Procedure body */
>
> FOR SELECT A,B,C,D,E FROM EXTERNAL_TABLE
> INTO :A, :B, :C, :D, :E
> DO
> BEGIN
> INSERT INTO TABLE (A,B,C,D,E)
> VALUES (:A, :B, :C, :D, :E);
>
> WHEN SQLCODE -803 DO
> BEGIN
> INSERT INTO TABLE_TMP(A,B,C,D,E)
> VALUES (:A, :B, :C, :D, :E);
> END
> END
>
> EXIT;
>END
>
>Is the file too big to do the import?

Yes. However, unless you are using a FAT32 or an NTFS 4 disk partition,
it's not the size of the file, but the number of rows being read. The
absolute number of row slots for a single table is 2^32 (just over 2
million). So - once the SP has walked past 2,147,483,648th row (or perhaps
a few less) the number is up, literally.

>Is there any other way of doing the import?

You will need to split the input file into much smaller chunks. But you
will encounter the same limit on row slots, eventually, when writing to the
internal tables. If the internal tables will be subject to updates,
deletes and/or further inserts, they will need to be a sufficient degree
smaller than 2.148 million rows to allow for the existence of multiple
record versions and retain capacity for increase. Somehow you will need to
find a way to spread these records across multiple tables by some kind of
grouping.

Another thing, that is a problem with trying to process a huge file in one
hit, is the strain on system resources from having so many uncommitted
inserts. You also need to break this task into separate transactions, so
that you can cleanly commit batches of about 8000 - 10,000 records.

/heLen