Subject Re: [firebird-support] Calcurate row size
Author Ann W. Harrison
At 01:13 AM 7/13/2004, Listeny wrote:
>Hi,
>
> Would someone tell me resources/documents of 'how to calculate number of
>bytes on each row' for FB1.5 ?

Believe it or not, that's a hard question. You can start with the easy part.

integer 4 bytes
short 2 bytes
int64 8 bytes

numeric (1,x) to numeric (4,x) 2 bytes
numeric (5,x) to numeric (9,x) 4 bytes
numeric (10,x) to numeric (18,x) 8 bytes
numeric > (18,x) 8 bytes

decimal (1,x) to decimal (4,x) 2 bytes
decimal (5,x) to decimal (9,x) 4 bytes
decimal (10,x) to decimal (18,x) 8t bytes
decimal > (18,x) 8 bytes

float 4 bytes
double 8 bytes

char(n) n bytes
varchar (n) n+2 bytes

blob 8 bytes
array 8 bytes

Then it gets harder. Every record has a 13 byte header and a 4 byte index
entry on page. Fragmented records have an additional 4 bytes of fragment
header. For every field, there's a bit that indicates null status, so for a
record with n fields there are (n + 7)/8 bytes of null flags. The page
itself has 28 bytes of overhead, and 13 bytes of free space is reserved for
each record stored in a page.

Then you ought to consider the actual blob/array sizes. Although blob or
array fields take up only 8 bytes in the record, they are stored without
compression so however big they are, they take that much space somewhere,
plus a certain amount of overhead. In the case of blobs and arrays, the
overhead is negligible.

Then it gets a bit harder still. Depending on the hardware and operating
system architectures, the record includes some number of pad bytes so that
data ends up aligned correctly on word, long, or quad boundaries.

Then it gets data dependent. The entire record excluding the header (13 or
17 bytes, depending on fragmentation) is subject to run length compression,
using a single byte to indicate run length. For a record that has no
duplicate bytes, you must add one byte for every 128 bytes of data
length. For a record with more than two adjacent bytes with the same
value, add two bytes and subtract the length of the run of identical
values. The compression does not recognize field values, so a four int64
values all containing the value 0 will compress from 24 bytes to 2.

Then you might want to consider the index overhead... or maybe not.

Cheers,


Ann

P.S. A dbkey is eight bytes of which the first 4 currently contain the
relation id and the second 4 express the record number within the
database. In the relatively near future, the portion of the dbkey allotted
to the record number will increase to allow more records per table.

At the moment, a database is limited to 2**32 pages - probably actually
2**31 because of coding errors. A blob or array id concatenates four bytes
of page number and four bytes of offset on page.