Subject Re: [firebird-support] Proper way for copy data from big table
Author Mark Rotteveel
On Fri, 19 Jun 2015 15:20:39 +0200, "liviuslivius
liviuslivius@... [firebird-support]"
<firebird-support@yahoogroups.com> wrote:
> what is the proper way for copy data from big table to another database
> if table is bigger then avaiable RAM and we need to do this in one
> transaction?
> You know if we do
> SELECT * FROM SOURCE_TABLE - and table have e.g. 400 000 000 records
> then retrive it is impossible because of RAM
>  
> but if we do this in steps 
>  
> SELECT FIRST 1000000 SKIP 0 * FROM SOURCE_TABLE <-we got 1000000 reads
->
> ok
> SELECT FIRST 1000000 SKIP 1000000  * FROM SOURCE_TABLE <-we got 2000000
> reads -> not ok
> SELECT FIRST 1000000 SKIP 2000000  * FROM SOURCE_TABLE <-we got 3000000
> reads -> not ok worser and worser (slower and slower)

You have two problems:
* Increasing number of reads: Firebird will need to read to know what to
skip
* No order: this may yield incorrect results (duplicates, or missing
records) if the optimizer chooses a different plan for a different
combination of first/skip values (I believe this is a theoretical concern
right now for Firebird, but not for other databases).

The proper way to do this is to execute a single query, and interleave
fetching of rows from the source with inserting rows into the target. That
should work without consuming all memory as long as the component you use
for querying doesn't fetch all rows at once or doesn't discard them.

For example in Java the default result set is forward only. If it doesn't
have rows, it will fetch a number (default is 400 in Jaybird), this is done
through a fetch. It will then serve requests for rows from the application
from this retrieved set of rows. When it needs more, it discards the old
set rows and fetches the next set. This ensures that memory is not
exhausted (assuming the application itself doesn't hold on to the retrieved
info).

Mark