firebird-support - Re: [firebird-support] Re: [Firebird-devel] Sweep process too many writes to disc

Subject	Re: [firebird-support] Re: [Firebird-devel] Sweep process too many writes to disc
Author	liviuslivius
Post date	2013-01-04T06:56:35Z

>>Why scan? Why not read and clean up at the same time?

if this is in one pass then i will be more surprised in this bigger reads ad writes

i add some more info
1. I use Super Server 64 bit
2. DB Page Size is 16k and DefaultDBPageCache is 65536
3. database was backed up and restored
and then i do few changes to some stored procedures and some analitics (selects)
and after this i delete 25% of data from db (only deletes no inserts or updates)

4. i disconnect from database rename it to see if it is not used and i restart fbserver

5. I run sweep process

as you can see there are non updates to the same record and sweep was only attachment to database
after 7 hour i connect to database to see what is going on in MON$ tables
i see many writes and many reads and also the same in windows task manager

i suppose that sweep do write for any record version removed from page not after all unnecessary record versions from whole page
if e.g. i have 20 old record versions in one page then this page is saved to disc 20 times...

is this right clue?

Karol Bieniaszewski

W dniu 2013-01-03 23:41:28 użytkownik Ann Harrison <aharrison@...> napisał:

On Thu, Jan 3, 2013 at 4:22 PM, Karol Bieniaszewski wrote:
> i have problem with understanding internal work of sweep
> my db size is 52.74GB i detach from db restart fb server and run sweep by
> gfix -sweep. now i see in task manager that Firebird 2.5.3.26584 write to
> disk 1154 GB
> this is 21 times bigger then db size itself!
>
> In my point of view sweep should do:
>
Err, the code is really definitive, regardless of what you think it should
do
> 1. scan every db pages and take info about record versions from
> transactions
> between Oldest Transaction and Oldest Active Transaction
>
Why scan? Why not read and clean up at the same time? Unless things have
changed, or my memory is failing (both possible), what sweep (like every
other transaction in cooperative garbage collection mode) does is read the
header page to determine the oldest transaction that was active when the
currently oldest transaction started. Any record that has a version that
age and versions older than that can be cleaned up. Then the sweep starts
reading every record in every table, starting with the first table in
RDB$RELATIONS and reading records in storage order. The read forces
garbage collection.
So, this is not a mark and sweep sort of sweep, but a single sweep.
However, back versions that are stored on different pages may complicate
the sweep.
> 2. after first steep, sweep should process old record versions by garbage
> collector work and then
>
See above.
> 3. progress Oldest Transaction to value equal Oldest Active Transaction
> from
> time when sweep start.
>
Right, or at least close. If there happens to be an old transaction stuck
in the first phase of a two phase commit, that's going to be the new Oldest.
>
> Garbage collector should work in page lock mode and when all info from page
> will be proccesed then if some old record versions was removed write whole
> page to hard drive.
>
The sweep process holds a write lock on each data page while it is making
changes to it. If other transactions request a lock, the sweeper will
release the page once it is internally consistent. That's the case for all
>
> Can you tell me what I omitted?
>
Ah, one possibility is that you have fragmented records or records with
back versions on different pages. The way all garbage collection (sweep,
cooperative, or special thread) works is that most recent version of a
record is stored in a known location. Back versions are located by
following a pointer from the newest to the next older and from that to the
next older, etc. Fragmented record versions work much the same way. If
you have not left free space on pages, or if you've done a lot of updates
to a single record, the newest version may not be on the same page with the
older versions. Worse, the old versions may be on different pages.
So, although the sweep is reading the first record from your table, its
back versions may all be on different pages.
Sweep work in this way or such different? If yes i should see many reads and
> only few writes to disk but this not happend. I see >1TB writes to disc
> for db size 52.74GB.
>
That seems extreme, but leads to several questions.
One is whether your application regularly makes large changes to the same
record, or several changes in the same transaction. Back versions are
normally stored as differences from the next newer version, so they're
usually small. However, if you change more than 255 bytes in a record, or
if you change the record more than once in a transaction, the whole back
version is stored - generally much larger than a difference records and
therefore more likely to go off page.
Another is what cache size you give the sweep. If you've got back versions
off page, then you'll need a larger cache to keep from writing the same
page over and over.
A third is what other processing is going on simultaneously. Sweep does
not hold an exclusive lock on a page while it scrubs the entire page, only
for long enough to make the page consistent. If other transactions need
the page, it will be released - and at this point the number of writes
varies depending on whether you're running one of the Classics or
SuperServer. In neither case is having sweep compete with update
transactions a good thing. Necessary sometimes, but not performance
enhancing.
Sorry I didn't pick this one up in Support. This has been a somewhat
harried time for me. I've copied the support list because the information
is more appropriate there.
Good luck,
Ann
> P.S. you might think a mark and sweep might be more efficient, and there
might be a solution there, but it would require a completely different sort
of sweeper and a few bits that were not available. My first five thoughts
almost certainly increase the number of writes.
Sweep works through the normal database interface - it's just a program
that reads the database, using cooperative garbage collection so it cleans
up old versions immediately. Its one clever trick is knowing how to reset
the oldest interesting transaction on the page header.
A lower level sweeper could keep track of the page index numbers of back
versions that should be garbage collected. That would involve reading
pages that hold off-page back versions. But the list would be invalid if
any other transaction or the garbage collect thread removed those old
versions and reused the space. Garbage collecting new records is a bad
idea. Of course, the sweeper could mark collectible versions, but that
would require one write for the mark and another for the removal. Probably
somebody smarter could come up with a better method. As far as I can tell,
any such method would require a sweeper that operates at a much lower level
than the current one, making it more fragile and higher maintenance.

[Non-text portions of this message have been removed]