Subject Re: [IB-Architect] Disk data integrity (was: Torn Page detection)
Author Ivan Prenosil
Claudio wrote:
> > I guess I qualify as "other than Ann". The current engine doesn't do
> > anything to ensure disk data integrity except for a 'weak' page type
> > check on each fetch of a page. There's database shadowing and RAID
> > mirroring but that doesn't help if the page is corrupted in memory and
> > then rewritten to disk.
> For what I understand, the engine only ensures the TIP is written after the
> page itself has been written (so there may be orphan pages but not dangling
> references to non-extant pages).

I believe that it is more complex than just deferring writing of TIP.
Everytime IB needs to write page that contains pointer to another page,
it must choose right order of write operations.

E.g. if you insert new row to table, and there is no free space in gdb,
then IB
- allocates new free page, and write this information to disk
(to avoid double allocation of this page)
- writes your data to this page (and to disk)
- now that the page (containing system header and data) is written to disk,
the pointer to this page is added to list of data pages belonging to the table.

If you delete row from table, and this row was last on the page,
then that page is first "unbound" from table, and than added to pool of free pages.
(and not vice versa, otherwise you can end up with page that is both
part of table and part of pool of free pages).

> However, given enough file system's cache in the OS, this may not suffice
> as the cache may not be easily flushed after each write operation.

This is why there are "Forced writes" (also called "Synchronous writes").
Each write operation completes not when data are written to OS disk cache,
but when write operation is acknowledged by disk controller.

> Also, the
> mechanism doesn't protect against records than span two or more pages.

Using forced writes and correct write order should not cause severe
db corruption (only orphan pages and perhaps some index problems),
provided that each page is written to disk as a whole.
(If record spans more pages, IB should write its tail first;
then you can get only error something like "orphan record fragment")

> Probably a db page that spans more than one file system cluster is already
> a consistency problem.

Yes, if database page is larger than disk sector, you should think twice
before shutting down the server by unplugging it.
(Partially written sectors should be detected by disk itself)