Subject | Re: Garbage, was Re: [firebird-support] SubSelect problems |
---|---|
Author | Ann W. Harrison |
Post date | 2005-01-14T19:14:38Z |
Fabricio Araujo wrote:
then all snapshot transactions that were active concurrent with the
transaction that did the delete must end. Then the records can be
garbage collected. The right solution, which is implemented in Firebird
2, is to change the way index duplicates are represented so they can be
garbage collected efficiently (see below). The next best solution is
for the database designer to avoid indexes with lots of duplicates. If
concurrency requirements allow, the next best solution is to drop the
index, run the count(*), then recreate the index. Running a sacrificial
count (*) on the table will tie up the server, but if it's done at a
time when that's not going to interfere with valuable work, that's OK.
Regards,
Ann
Here's the problem with duplicates, as I understand it. When you store
a record, the data gets stored on a data page and an entry is stored in
each index. The algorithm for finding space on data pages has the
result that data is generally stored after existing data. Not always,
but usually. When an index entry is a duplicate, it's stored at the
beginning of the chain of duplicates, so the most recently stored index
entry is first.
A delete, whether indexed or not, will start on the first page that has
records to delete and end at the last, so, in general, the oldest record
is deleted first. As part of expunging the record, the system looks for
index entries that must be deleted. That means that the last index
entry in a duplicate chain is likely to match the first record to be
deleted, and so on.
>Not in the same transaction, no. First the delete must be committed,
>
> Hmmmm.... You know this beast much better than me, Ann...
> But what if she perform a "select count() from t1" style
> query in the same txn she made the massive delete?
> Could this helps?
then all snapshot transactions that were active concurrent with the
transaction that did the delete must end. Then the records can be
garbage collected. The right solution, which is implemented in Firebird
2, is to change the way index duplicates are represented so they can be
garbage collected efficiently (see below). The next best solution is
for the database designer to avoid indexes with lots of duplicates. If
concurrency requirements allow, the next best solution is to drop the
index, run the count(*), then recreate the index. Running a sacrificial
count (*) on the table will tie up the server, but if it's done at a
time when that's not going to interfere with valuable work, that's OK.
Regards,
Ann
Here's the problem with duplicates, as I understand it. When you store
a record, the data gets stored on a data page and an entry is stored in
each index. The algorithm for finding space on data pages has the
result that data is generally stored after existing data. Not always,
but usually. When an index entry is a duplicate, it's stored at the
beginning of the chain of duplicates, so the most recently stored index
entry is first.
A delete, whether indexed or not, will start on the first page that has
records to delete and end at the last, so, in general, the oldest record
is deleted first. As part of expunging the record, the system looks for
index entries that must be deleted. That means that the last index
entry in a duplicate chain is likely to match the first record to be
deleted, and so on.