firebird-support - Re: [ib-support] limitations/maximum specs

Subject	Re: [ib-support] limitations/maximum specs
Author	Ann W. Harrison
Post date	2001-06-13T19:09:29Z

At 07:59 PM 6/13/2001 +0200, Ivan Prenosil wrote:

>The problem is as follows (correct me where I am wrong):
>- each row in table is located by db_key, which is 64-bit value,

Right.

>- db_key consists of 32-bits representing table-id, and
> 32-bits representing "record number".

Right. We could restructure that in the future since few
database actually have more than 64K tables.

>- so "potential" address space for rows in table is 2^32,
> but this theoretical maximum can never be reached,

Right.

>- because 32-bit record number is further structured, and consists of
> -pointer page number

It's actually a relative pointer page number - the sequence of this
pointer page in the list of pointer pages for the table.

> -offset on that pointer page

Right.

> -offset into pointer to row on data page

Right - the offset on the data page that contains the page index
entry for the data. The page index is a variable length array of
4 byte entries, each of which contains the offset and length of a
page. Every record, blob, back version, and fragment on a page
has an entry in the page index.

>- I am not familiar enough with IB source code yet,
> so I could not find algorithm for "(un)packing" these three parts
> (from)into single 32-bit record number value, but I am sure it is not
> as simple as e.g. [10-bits pointer page, 10-bits offset, etc...]

Right. Here's the code that takes apart the second part of the key:

DECOMPOSE (rpb->rpb_number, dbb->dbb_max_records, sequence, line);
DECOMPOSE (sequence, dbb->dbb_dp_per_pp, pp_sequence, slot);

Now you probably want to know how the DECOMPOSE macro expands:

#define DECOMPOSE(n, divisor, q, r) {r = n % divisor; q = n / divisor;}

And being that sort of people, you probably want to have
dbb->dbb_dp_per_pp and dbb->dbb_max_records explained. The
dbb is the database block - the granddaddy block, usually wrapped
in a tdbb (thread specific database block).

/* Compute the number of data pages per pointer page. Each data page
requires a 32 bit pointer and a 2 bit control field. */

dbb->dbb_dp_per_pp = (dbb->dbb_page_size - OFFSETA (PPG, ppg_page)) * 8 /
(BITS_PER_LONG + 2);

OFFSETA is a macro that takes alignment into account when computing
the position of a structure given a starting point... PPG is the
header of a pointer page, and ppg_page is the start of the data.
The extra two bits are free space indicators which are stored at
the bottom of the pointer page - two bits per data page.

And max_records:

dbb->dbb_max_records = (dbb->dbb_page_size - sizeof (struct dpg)) /
(sizeof (struct dpg_repeat) + OFFSETA (RHD, rhd_data));

the dpg struct is the header of a data page. The dpg_repeat structure
is the page index entry - there's always one for every entry on a data
page. RHD is the record header structure, rhd_data is where the data
begins.

>"offset into pointer to row on data page" has its maximum
> for given page length, because only limited number of rows can fit
> on data page (e.g. max. 35 records for 1024 page_size).

As the code above shows (sic) Firebird is assuming a zero length
record.

>"offset on pointer page" has also its maximum for given page length
> (about 230 for 1024 page_size)

dbb->dbb_dp_per_pp

>"pointer page numbers" are chained and are given consecutive values
> (that can be found in rdb$pages table).

They are normally dense, but can have holes if an entire pointer page
has gone empty and been released.

>- now after some theory let's make little experiment:
> -create database
> -disable reserving space for inserted rows (by gfix -use full)
> -create table with short rows, e.g. create table tab(x char(1));
> -fill that table, e.g. by stored procedure with loop
> -look at db_keys: select rdb$db_key from tab;
>
> -you will find out that even if you use shortest rows possible
> there are quite large gaps in db_key numbers.
> e.g. for page size 1024 max. 35 rows will fit on one data page,
> so you can see db_keys

Right - because the algorithm is assuming a zero length record.

> 1,2,3, ... 22,23 ( 23(16)=35(10) )
> and next "batch" of rows is numbered
> 3B,3C, ... 5C,5D ( 3B(16)=59(10) )
>
> as you can see, db_keys 24..3A (i.e. 23 values) are lost!
> (other page sizes are not much better, you will see similar
> "wasting" of record number address space).
> (Surprisingly, after inserting some amount of rows,
> the gaps gets even larger!)
>
>So, am I really completely wrong, or is the algorithm for assigning
>record numbers not_as_good_as_it_could_be ?

That's true. The actual minimum length record is two bytes - one
byte for a single char field and one byte to hold the null flag,
assuming both are binary zeros so the compression turns it into
one byte containing -2, and one byte containing a binary zero.

A more reasonable algorithm would use different values max_record
values for different tables, taking a plausible compression ratio.
Personally, I'd rather steal a couple bytes from the table portion
of the db_key, though that's going to have repercussions everywhere.

> In one older message, Jim Starkey wrote:
>=====
>So to compute a reliable record number from (point page#, data page#,
>line#), you need to know the maximum records per data page, and that's
>a problem. There is no nice trade-off here. You can decide to use
>the maximum number of minimum size records (theoretical minimum),
>which wastes record number space (every data page with few records
>leaves holes in the number space) or pick an arbitrary number out of
>hat, which leaves the potential for wasting data page space because
>a page ran out of line numbers.
>=====
>It seems to me that IB uses even higher value than maximum in
>"maximum number of minimum size records (theoretical minimum)"

Regards,

Ann
www.ibphoenix.com
We have answers.