Subject Re: Firebird 3.0 database page size
Author lmatusz
First of all i am wondering if my calculations are correct.
Here is a small explanation of my calculation:


They are based on assumptions taken from src/jrd/RecordNumber.h and other source files related (specially the

The 40 bit record number (64 bit internally) must be decomposed onto 3 USHORTS (pp_sequence, slot, line).

pp_sequence is just number of pointer page (MAX = 65535)

slot is index of pointer_page::ppg_page (the pointer_page::ppg_page[slot] is an number of data_page)
line is index of data_page::dpg_rpt (the data_page::dpg_rpt[line] is an address of record)

max slot number is computed from maximum number of data_page(s) in one pointer page.

maximum_number_of_data_pages_in_one_pointer_page = ( database_page_size - 32 ) / (17/4)

(32 is taken from overhead of pointer_page

struct pointer_page
{
pag ppg_header; // (16 bytes long)
SLONG ppg_sequence; // Sequence number in relation
SLONG ppg_next; // Next pointer page in relation
USHORT ppg_count; // Number of slots active
USHORT ppg_relation; // Relation id
USHORT ppg_min_space; // Lowest slot with space available
USHORT ppg_max_space; // Highest slot with space available
SLONG ppg_page[1]; // Data page vector (till now it takes 32 bytes without ppg_page)
};

17/4 = 4 and 2/8 - 4 bytes is length SLONG ppg_page and 2/8 is 2 bits for every ppg_page slot in array of ppg_page (these every 2 bits for one entry of ppg_page are placed at the end of page)
)

max line number is computed from maximum records per data page.

maximum_records_per_data_page = (database_page_size - 24) / (compressed_record_size_with_overhead + 4)

(24 is taken from overhead of data_page

struct data_page
{
pag dpg_header; // (16 bytes long)
SLONG dpg_sequence; // Sequence number in relation
USHORT dpg_relation; // Relation id
USHORT dpg_count; // Number of record segments on page
struct dpg_repeat (till now it takes 24 bytes without dpg_repeat)
{
USHORT dpg_offset; // Offset of record fragment
USHORT dpg_length; // Length of record fragment
} dpg_rpt[1];
};

4 is taken from sizeof(dpg_repeat). There is also a record itself in a database page so it is stated above as compressed_record_size_with_overhead.

The maximum_number_of_data_pages_in_one_pointer page is dependent only from database_page_size.

The maximum_records_per_data_page as the name suggests is depended on compressed record size with overhead and database_page_size.

If you are interested where are these numbers come from

> If you are interested to calculate it correctly, read this first http://firebirdsql.org/manual/fb-internals.html

Yes i have read it, but i also read the source code of Firebird SQL Database 2.5.

If you doesn't have sources, the mentioned functions are here:

inline bool checkNumber(USHORT records_per_page, // ~400 (8k page)
USHORT data_pages_per_pointer_page) const // ~2000 (8k page)
{
// We limit record number value to 40 bits and make sure decomposed value
// fits into 3 USHORTs. This all makes practical table size limit (not
// counting allocation threshold and overhead) roughtly equal to:
// 16k page - 20000 GB
// 8k page - 10000 GB
// 4k page - 2500 GB
// 2k page - 600 GB
// 1k page - 150 GB
// Large page size values are recommended for large databases because
// page allocators are generally linear.
return value < QUADCONST(0x10000000000) &&
value < (SINT64) MAX_USHORT * records_per_page * data_pages_per_pointer_page;
}

inline void decompose(USHORT records_per_page, // ~400 (8k page)
USHORT data_pages_per_pointer_page, // ~2000 (8k page)
SSHORT& line,
SSHORT& slot,
USHORT& pp_sequence) const
{
// Use explicit casts to suppress 64-bit truncation warnings
line = static_cast<SSHORT>(value % records_per_page);
const ULONG sequence = static_cast<ULONG>(value / records_per_page);
slot = sequence % data_pages_per_pointer_page;
pp_sequence = sequence / data_pages_per_pointer_page;
}

inline void compose(USHORT records_per_page, // ~400 (8k page)
USHORT data_pages_per_pointer_page, // ~2000 (8k page)
SSHORT line,
SSHORT slot,
USHORT pp_sequence)
{
value = (((SINT64) pp_sequence) * data_pages_per_pointer_page + slot) * records_per_page + line;
}


> Do you really have that much data?
>

I will be gathering URIs from amazon.com to find m3u files there. We all know that there is 15 million of songs there, and every song if linked with 3 or more pages. Plus we have other stuff in amazon.com like books and we all know that there is about 31 million of books there and that they are linked with more pages. There are also computer hardware on amazon.com and lots of other stuff. Everything will be great if it will be fitted in one table with destination for amazon.com. I do not have estimated pages for amazon.com but it will be great enough.

My solution for this is to create blob that will fit more than one uris and will have of around 32kB size (or little below that value).
So my table will look more like this

CREATE TABLE amazoncom (
PK BIGINT NOT NULL,
URI BLOB SUB_TYPE 1
);

Yes, i am gathering links from web pages.

Best regards
Lukasz Matuszewski