Subject | Generator bugfix RFP |
---|---|
Author | Charlie Caro |
Post date | 2000-10-23T15:52:02Z |
Let's roll up our sleeves and place our partisan politics aside. This is
a severity 1 bug which effects every version of InterBase since the V3
introduction of generators. The final solution might require an ODS
change so it's important that both code bases come to an accord on a
solution with minimal impact to customers and users.
Ann, you may want to give Jim a tug on the sleeve or whatever
you do to get his attention these days.
1. BUG
I think the basic bug was reported against V6, something like
the following: If more than 116 generators are created then
subsequent usage of those generators can corrupt the database.
Note: All examples below assume a 1KB database page size. You
can adjust the numbers for other database page sizes. The actual
numbers are approximate and rounded for easier comprehension.
2. REVIEW GENERATOR PAGE IMPLEMENTATION
The V3 implementation of generators defined them as a 32-bit
quantity. Since a pointer page consisted of an array of 32-bit
page numbers, it was decided to borrow the structure of a pointer
page to store generator values. The page itself was given its
own page type but otherwise it used the structure of the pointer
page.
When a database is opened, certain database geometries are computed
based on the database page size. One of those values is the number
of data pages per pointer page. As a simple example, assume the
usable space on a 1,024 byte page, after accounting for a boilerplate
page header is 1,000 bytes. A pointer page should be able to hold
250 32-bit page numbers. However, each page slot also has 2-bit
control flags reserved at the end of page. This reduces the number of
page slots to 235.
The engine keeps a vector of generator pages in "dbb->dbb_gen_id_pages".
To find the actual page number, the relative generator ID is decomposed
into a <sequence, offset> pair. The sequence no. indexes this vector for
the actual page number while the offset is used for a scaled access to
the generator value on that page. Here's the computation from
jrd/dpm.e/DPM_gen_id():
sequence = generator / dbb->dbb_pcontrol->pgc_ppp;
offset = generator % dbb->dbb_pcontrol->pgc_ppp;
and from from jrd/pag.h (pgc_ppp):
/* Page control block -- used by PAG to keep track of critical
constants */
typedef struct pgc {
struct blk pgc_header;
SLONG pgc_high_water; /* Lowest PIP with space */
SLONG pgc_ppp; /* Pages per pip */
SLONG pgc_pip; /* First pointer page */
int pgc_bytes; /* Number of bytes of bit in PIP
*/
int pgc_tpt; /* Transactions per TIP */
} *PGC;
The source of the bug now becomes obvious when you consider that
V6 generators are 64-bits but the <sequence, offset> pair are
computed based on a 32-bit quantity (page numbers on pointer pages).
So only 117 V6 generators can fit on a 1KB page although 235 page
numbers can still be stored on a page. The <sequence, pair> components
should be divided by a factor of 2.
V6 generator IDs 0-116 are safely updating on page while 117-233 are
unsafely updating values beyond the ending boundary of a generator
page. The pattern repeats: 234-450 IDs are safe while 451-577 are
unsafe ...
3. IF ONLY THE BUG STOPPED HERE
... the situation would be bad but at least V6 is not certified
(although I know some users are using it for production). The
majority of users are pre-V6 so you would think the exposure is
contained.
Not so.
Remember the database constant "pgc_ppp" (from the page control block
above) used to calculate the <sequence, offset> pair from generator ID
value. That doesn't denote the number of data page numbers stored on
a pointer page. It denotes the number of pages whose
allocated/unallocated state can be stored on a page inventory page
(pip). Since 1-bit is used for each page this constant evaluates to
about 8,000 on a 1KB database page size. The correct constant is stored
as "dbb_dp_per_pp" off the database block.
This means that since V3 only generator IDs of 0-249 have been safe
while the 250-7,999 IDs have generated unsafe updates and random values.
Again 8,000-8,249 would be safe etc.
The bug may be more noticeable today because users are concocting novel
uses for generators than was originally concieved when they were
designed (GUID generation, temp. table IDS ...). Users are also creating
and trying to destroy them in dynamic ways, hence the often requested
DDL statement DROP GENERATOR. Also, databases are staying online longer
without backup/restore so that dynamic generator creation is raising the
associative generator ID that is assigned to a user-specified generator
name.
4. WHAT USERS CAN DO
Users can query RDB$GENERATORS to see the generator IDS associated with
their named generators. Anything ID > 116 for a database with a 1KB page
size doesn't exist and never has. You referenced an arbitrary value on
some other database page, an internal server memory structure, or
incurred an access violation which appeared spurious to you at the time.
Obviously, raising the database page size will raise the threshold
at which this bug appears. It should be noted that a backup restore
will renumber the associative generator IDS (not the generator value)
starting at 1.
After we agree on a technical solution, customers can back up their
databases. If they backup with the database with the defective engine,
they will get random generator values which will be restored under the
corrected engine. If they backup/restore with the corrected engine, I
believe the existing code will dynamically sense an unallocated
generator page, dynamically allocate it, and return a generator value of
0 (zero).
Since generators are not a database type, there is no easy way to tell
which columns in which tables have been populated with bogus values. The
reset generator values may cause duplicate errors. There is no way to
salvage generator values that never really existed in the first place.
The best we can do is fix the bug to work correctly -- users will have
to scrub their databases themselves.
Another solution may be for customers to overload generator IDS 1-116
with the generators whose IDS are greater than 116 by manually updating
the RDB$GENERATORS system table. This will produce gaps in generator
values for their original use and may still produce duplicates in their
overloaded use. However, it will prevent data corruption of surrounding
pages and random failure due to this bug.
5. SOLUTIONS
We need solutions for V6 and pre-V6. I know what I would do. Ann, Jim
and Dave S. and internal Inprise engineers, let's give others a few
days head start to participate in a solution that addresses customers,
technical issues and extensibility (i.e. how might DROP GENERATOR
someday be supported either at the ODS level or logically at the
RDB$GENERATOR system table level).
Regards,
Charlie
a severity 1 bug which effects every version of InterBase since the V3
introduction of generators. The final solution might require an ODS
change so it's important that both code bases come to an accord on a
solution with minimal impact to customers and users.
Ann, you may want to give Jim a tug on the sleeve or whatever
you do to get his attention these days.
1. BUG
I think the basic bug was reported against V6, something like
the following: If more than 116 generators are created then
subsequent usage of those generators can corrupt the database.
Note: All examples below assume a 1KB database page size. You
can adjust the numbers for other database page sizes. The actual
numbers are approximate and rounded for easier comprehension.
2. REVIEW GENERATOR PAGE IMPLEMENTATION
The V3 implementation of generators defined them as a 32-bit
quantity. Since a pointer page consisted of an array of 32-bit
page numbers, it was decided to borrow the structure of a pointer
page to store generator values. The page itself was given its
own page type but otherwise it used the structure of the pointer
page.
When a database is opened, certain database geometries are computed
based on the database page size. One of those values is the number
of data pages per pointer page. As a simple example, assume the
usable space on a 1,024 byte page, after accounting for a boilerplate
page header is 1,000 bytes. A pointer page should be able to hold
250 32-bit page numbers. However, each page slot also has 2-bit
control flags reserved at the end of page. This reduces the number of
page slots to 235.
The engine keeps a vector of generator pages in "dbb->dbb_gen_id_pages".
To find the actual page number, the relative generator ID is decomposed
into a <sequence, offset> pair. The sequence no. indexes this vector for
the actual page number while the offset is used for a scaled access to
the generator value on that page. Here's the computation from
jrd/dpm.e/DPM_gen_id():
sequence = generator / dbb->dbb_pcontrol->pgc_ppp;
offset = generator % dbb->dbb_pcontrol->pgc_ppp;
and from from jrd/pag.h (pgc_ppp):
/* Page control block -- used by PAG to keep track of critical
constants */
typedef struct pgc {
struct blk pgc_header;
SLONG pgc_high_water; /* Lowest PIP with space */
SLONG pgc_ppp; /* Pages per pip */
SLONG pgc_pip; /* First pointer page */
int pgc_bytes; /* Number of bytes of bit in PIP
*/
int pgc_tpt; /* Transactions per TIP */
} *PGC;
The source of the bug now becomes obvious when you consider that
V6 generators are 64-bits but the <sequence, offset> pair are
computed based on a 32-bit quantity (page numbers on pointer pages).
So only 117 V6 generators can fit on a 1KB page although 235 page
numbers can still be stored on a page. The <sequence, pair> components
should be divided by a factor of 2.
V6 generator IDs 0-116 are safely updating on page while 117-233 are
unsafely updating values beyond the ending boundary of a generator
page. The pattern repeats: 234-450 IDs are safe while 451-577 are
unsafe ...
3. IF ONLY THE BUG STOPPED HERE
... the situation would be bad but at least V6 is not certified
(although I know some users are using it for production). The
majority of users are pre-V6 so you would think the exposure is
contained.
Not so.
Remember the database constant "pgc_ppp" (from the page control block
above) used to calculate the <sequence, offset> pair from generator ID
value. That doesn't denote the number of data page numbers stored on
a pointer page. It denotes the number of pages whose
allocated/unallocated state can be stored on a page inventory page
(pip). Since 1-bit is used for each page this constant evaluates to
about 8,000 on a 1KB database page size. The correct constant is stored
as "dbb_dp_per_pp" off the database block.
This means that since V3 only generator IDs of 0-249 have been safe
while the 250-7,999 IDs have generated unsafe updates and random values.
Again 8,000-8,249 would be safe etc.
The bug may be more noticeable today because users are concocting novel
uses for generators than was originally concieved when they were
designed (GUID generation, temp. table IDS ...). Users are also creating
and trying to destroy them in dynamic ways, hence the often requested
DDL statement DROP GENERATOR. Also, databases are staying online longer
without backup/restore so that dynamic generator creation is raising the
associative generator ID that is assigned to a user-specified generator
name.
4. WHAT USERS CAN DO
Users can query RDB$GENERATORS to see the generator IDS associated with
their named generators. Anything ID > 116 for a database with a 1KB page
size doesn't exist and never has. You referenced an arbitrary value on
some other database page, an internal server memory structure, or
incurred an access violation which appeared spurious to you at the time.
Obviously, raising the database page size will raise the threshold
at which this bug appears. It should be noted that a backup restore
will renumber the associative generator IDS (not the generator value)
starting at 1.
After we agree on a technical solution, customers can back up their
databases. If they backup with the database with the defective engine,
they will get random generator values which will be restored under the
corrected engine. If they backup/restore with the corrected engine, I
believe the existing code will dynamically sense an unallocated
generator page, dynamically allocate it, and return a generator value of
0 (zero).
Since generators are not a database type, there is no easy way to tell
which columns in which tables have been populated with bogus values. The
reset generator values may cause duplicate errors. There is no way to
salvage generator values that never really existed in the first place.
The best we can do is fix the bug to work correctly -- users will have
to scrub their databases themselves.
Another solution may be for customers to overload generator IDS 1-116
with the generators whose IDS are greater than 116 by manually updating
the RDB$GENERATORS system table. This will produce gaps in generator
values for their original use and may still produce duplicates in their
overloaded use. However, it will prevent data corruption of surrounding
pages and random failure due to this bug.
5. SOLUTIONS
We need solutions for V6 and pre-V6. I know what I would do. Ann, Jim
and Dave S. and internal Inprise engineers, let's give others a few
days head start to participate in a solution that addresses customers,
technical issues and extensibility (i.e. how might DROP GENERATOR
someday be supported either at the ODS level or logically at the
RDB$GENERATOR system table level).
Regards,
Charlie