Subject Re: Slowdown when processing large tables
Author hvlad
> I'm trying to normalize few very large tables (40-80 million records) by
> introducing surrogate keys and dropping lot of redundant columns. The
> existing PKs and FKs are all VARCHARs and some of the foreign keys are
> missing.
> I made stored procedures that are processing records from the large
> table in batches and committing at the end of each batch. While running
> my trials, I have noticed that after a while the processing considerably
> slows down: depending on the size of the batch it goes for example from
> 6 seconds per batch to 8 minutes per batch. This is happening even when
> I'm trying to populate separate table with surrogate keys only.
> Using Sinatica monitor, I have observed that as the trial test
> progresses the fetches from the cache increased for initial period - and
> then drop off completely.

This is expected and shows that cache works ;)

> There is plenty of RAM available, and the
> processor seems to be idle, however the disk monitor shows that activity
> is close to 100% but only managing 2MB/sec (while it achieves more than
> 50MB/sec in other tests)
> I'm using Firebird SS 2.1.3 on Windows 7 and cache is configured to be
> 1G. No other process were running during the tests.
> Example of stored procedure used in tests:
> CREATE OR ALTER PROCEDURE POPULATE_DEST_TABLE (
> from_rec integer,
> to_rec integer)
> returns (cnt integer)
> as
> declare variable u_id varchar(150);
> declare variable d_id varchar(150);
> begin
> for select src.u_id, src.d_id
> from src_table src
> order by src.d_id, src.u_id -- this index has the best selectivity
> rows :from_rec to :to_rec

Why do you split src_table by this parts ? Every next step (batch) should read and skip again and again already copied rows and more and more at every step.

If you think it is absolutely necessary, remember src.d_id, src.u_id values of last copied row, add WHERE clause to the SELECT query, and put remembered values into it to faster skip already copied rows. And, of course, replace from_rec and to_rec parameters by single number of rows to copy.

> into :u_id, :d_id
> do
> begin
> insert into dest_table -- surrogate ids only, from mater tables
> (u_sid, d_sid)
> values ((select u_sid from u where u.user_id = :u_id),
> (select d_sid from d where d.d_id = :d_id));
> end
> cnt = row_count;
> suspend;
> end

Hope this helps,
Vlad