Subject Re: [ib-support] limitations/maximum specs
Author ian
thanks, its all so much clearer now...!

Max no records per table? somewhere below 2^32 as Borland
(www.borland.com/interbase/tech_specs.html) have it, but numbers of
records in the billion region are not "impossible"? (especially in a
32TB gdb!)

"Ann W. Harrison" wrote:
>
> At 07:59 PM 6/13/2001 +0200, Ivan Prenosil wrote:
>
> >The problem is as follows (correct me where I am wrong):
> >- each row in table is located by db_key, which is 64-bit value,
>
> Right.
>
> >- db_key consists of 32-bits representing table-id, and
> > 32-bits representing "record number".
>
> Right. We could restructure that in the future since few
> database actually have more than 64K tables.
>
> >- so "potential" address space for rows in table is 2^32,
> > but this theoretical maximum can never be reached,
>
> Right.
>
> >- because 32-bit record number is further structured, and consists of
> > -pointer page number
>
> It's actually a relative pointer page number - the sequence of this
> pointer page in the list of pointer pages for the table.
>
> > -offset on that pointer page
>
> Right.
>
> > -offset into pointer to row on data page
>
> Right - the offset on the data page that contains the page index
> entry for the data. The page index is a variable length array of
> 4 byte entries, each of which contains the offset and length of a
> page. Every record, blob, back version, and fragment on a page
> has an entry in the page index.
>
> >- I am not familiar enough with IB source code yet,
> > so I could not find algorithm for "(un)packing" these three parts
> > (from)into single 32-bit record number value, but I am sure it is not
> > as simple as e.g. [10-bits pointer page, 10-bits offset, etc...]
>
> Right. Here's the code that takes apart the second part of the key:
>
> DECOMPOSE (rpb->rpb_number, dbb->dbb_max_records, sequence, line);
> DECOMPOSE (sequence, dbb->dbb_dp_per_pp, pp_sequence, slot);
>
> Now you probably want to know how the DECOMPOSE macro expands:
>
> #define DECOMPOSE(n, divisor, q, r) {r = n % divisor; q = n / divisor;}
>
> And being that sort of people, you probably want to have
> dbb->dbb_dp_per_pp and dbb->dbb_max_records explained. The
> dbb is the database block - the granddaddy block, usually wrapped
> in a tdbb (thread specific database block).
>
> /* Compute the number of data pages per pointer page. Each data page
> requires a 32 bit pointer and a 2 bit control field. */
>
> dbb->dbb_dp_per_pp = (dbb->dbb_page_size - OFFSETA (PPG, ppg_page)) * 8 /
> (BITS_PER_LONG + 2);
>
> OFFSETA is a macro that takes alignment into account when computing
> the position of a structure given a starting point... PPG is the
> header of a pointer page, and ppg_page is the start of the data.
> The extra two bits are free space indicators which are stored at
> the bottom of the pointer page - two bits per data page.
>
> And max_records:
>
> dbb->dbb_max_records = (dbb->dbb_page_size - sizeof (struct dpg)) /
> (sizeof (struct dpg_repeat) + OFFSETA (RHD, rhd_data));
>
> the dpg struct is the header of a data page. The dpg_repeat structure
> is the page index entry - there's always one for every entry on a data
> page. RHD is the record header structure, rhd_data is where the data
> begins.
>
> >"offset into pointer to row on data page" has its maximum
> > for given page length, because only limited number of rows can fit
> > on data page (e.g. max. 35 records for 1024 page_size).
>
> As the code above shows (sic) Firebird is assuming a zero length
> record.
>
> >"offset on pointer page" has also its maximum for given page length
> > (about 230 for 1024 page_size)
>
> dbb->dbb_dp_per_pp
>
> >"pointer page numbers" are chained and are given consecutive values
> > (that can be found in rdb$pages table).
>
> They are normally dense, but can have holes if an entire pointer page
> has gone empty and been released.
>
> >- now after some theory let's make little experiment:
> > -create database
> > -disable reserving space for inserted rows (by gfix -use full)
> > -create table with short rows, e.g. create table tab(x char(1));
> > -fill that table, e.g. by stored procedure with loop
> > -look at db_keys: select rdb$db_key from tab;
> >
> > -you will find out that even if you use shortest rows possible
> > there are quite large gaps in db_key numbers.
> > e.g. for page size 1024 max. 35 rows will fit on one data page,
> > so you can see db_keys
>
> Right - because the algorithm is assuming a zero length record.
>
> > 1,2,3, ... 22,23 ( 23(16)=35(10) )
> > and next "batch" of rows is numbered
> > 3B,3C, ... 5C,5D ( 3B(16)=59(10) )
> >
> > as you can see, db_keys 24..3A (i.e. 23 values) are lost!
> > (other page sizes are not much better, you will see similar
> > "wasting" of record number address space).
> > (Surprisingly, after inserting some amount of rows,
> > the gaps gets even larger!)
> >
> >So, am I really completely wrong, or is the algorithm for assigning
> >record numbers not_as_good_as_it_could_be ?
>
> That's true. The actual minimum length record is two bytes - one
> byte for a single char field and one byte to hold the null flag,
> assuming both are binary zeros so the compression turns it into
> one byte containing -2, and one byte containing a binary zero.
>
> A more reasonable algorithm would use different values max_record
> values for different tables, taking a plausible compression ratio.
> Personally, I'd rather steal a couple bytes from the table portion
> of the db_key, though that's going to have repercussions everywhere.
>
> > In one older message, Jim Starkey wrote:
> >=====
> >So to compute a reliable record number from (point page#, data page#,
> >line#), you need to know the maximum records per data page, and that's
> >a problem. There is no nice trade-off here. You can decide to use
> >the maximum number of minimum size records (theoretical minimum),
> >which wastes record number space (every data page with few records
> >leaves holes in the number space) or pick an arbitrary number out of
> >hat, which leaves the potential for wasting data page space because
> >a page ran out of line numbers.
> >=====
> >It seems to me that IB uses even higher value than maximum in
> >"maximum number of minimum size records (theoretical minimum)"
>
> Regards,
>
> Ann
> www.ibphoenix.com
> We have answers.
>
> To unsubscribe from this group, send an email to:
> ib-support-unsubscribe@egroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/


Thanks for your quick responses, I love newsgroups,
Ian

ps my posts still arent showing on the egroups.ib-support list but they
appear to get answers so they are going somewhere?