Subject | Slowdown when processing large tables |
---|---|
Author | Nenad |
Post date | 2010-11-29T19:23:46Z |
Hi
I'm trying to normalize few very large tables (40-80 million records) by
introducing surrogate keys and dropping lot of redundant columns. The
existing PKs and FKs are all VARCHARs and some of the foreign keys are
missing.
I made stored procedures that are processing records from the large
table in batches and committing at the end of each batch. While running
my trials, I have noticed that after a while the processing considerably
slows down: depending on the size of the batch it goes for example from
6 seconds per batch to 8 minutes per batch. This is happening even when
I'm trying to populate separate table with surrogate keys only.
Using Sinatica monitor, I have observed that as the trial test
progresses the fetches from the cache increased for initial period - and
then drop off completely. There is plenty of RAM available, and the
processor seems to be idle, however the disk monitor shows that activity
is close to 100% but only managing 2MB/sec (while it achieves more than
50MB/sec in other tests)
I'm using Firebird SS 2.1.3 on Windows 7 and cache is configured to be
1G. No other process were running during the tests.
Example of stored procedure used in tests:
CREATE OR ALTER PROCEDURE POPULATE_DEST_TABLE ( from_rec integer,
to_rec integer)returns (cnt integer)asdeclare variable u_id
varchar(150);declare variable d_id varchar(150);begin
for select src.u_id, src.d_id from src_table src order by
src.d_id, src.u_id -- this index has the best selectivity rows
:from_rec to :to_rec into :u_id, :d_id do begin insert into
dest_table -- surrogate ids only, from mater tables (u_sid,
d_sid) values ((select u_sid from u where u.user_id =
:u_id), (select d_sid from d where d.d_id = :d_id)); end
cnt = row_count; suspend;
end
Hope someone help me out with this one.
RegardsNenad
[Non-text portions of this message have been removed]
I'm trying to normalize few very large tables (40-80 million records) by
introducing surrogate keys and dropping lot of redundant columns. The
existing PKs and FKs are all VARCHARs and some of the foreign keys are
missing.
I made stored procedures that are processing records from the large
table in batches and committing at the end of each batch. While running
my trials, I have noticed that after a while the processing considerably
slows down: depending on the size of the batch it goes for example from
6 seconds per batch to 8 minutes per batch. This is happening even when
I'm trying to populate separate table with surrogate keys only.
Using Sinatica monitor, I have observed that as the trial test
progresses the fetches from the cache increased for initial period - and
then drop off completely. There is plenty of RAM available, and the
processor seems to be idle, however the disk monitor shows that activity
is close to 100% but only managing 2MB/sec (while it achieves more than
50MB/sec in other tests)
I'm using Firebird SS 2.1.3 on Windows 7 and cache is configured to be
1G. No other process were running during the tests.
Example of stored procedure used in tests:
CREATE OR ALTER PROCEDURE POPULATE_DEST_TABLE ( from_rec integer,
to_rec integer)returns (cnt integer)asdeclare variable u_id
varchar(150);declare variable d_id varchar(150);begin
for select src.u_id, src.d_id from src_table src order by
src.d_id, src.u_id -- this index has the best selectivity rows
:from_rec to :to_rec into :u_id, :d_id do begin insert into
dest_table -- surrogate ids only, from mater tables (u_sid,
d_sid) values ((select u_sid from u where u.user_id =
:u_id), (select d_sid from d where d.d_id = :d_id)); end
cnt = row_count; suspend;
end
Hope someone help me out with this one.
RegardsNenad
[Non-text portions of this message have been removed]