Subject Re: [IB-Architect] Disk data integrity (was: Torn Page detection)
Author Charlie Caro
"Leyne, Sean" wrote:
>
> Steen,
>
> In addition to ensuring that the disk write completed properly, the
> checksum has other advantages/uses - ensures that no data has been
> changed by an "outside" process, whether accidentally (hard disk errors)
> or deliberately (admittedly this is a very crude method which could be
> easily overcome by a 15year old)
>
> Your posting does raise a very interesting question.
>
> Can someone (preferably, other than Ann - she has enough to do) explain
> how the current engine ensures disk data integrity.
>

I guess I qualify as "other than Ann". The current engine doesn't do
anything to ensure disk data integrity except for a 'weak' page type
check on each fetch of a page. There's database shadowing and RAID
mirroring but that doesn't help if the page is corrupted in memory and
then rewritten to disk.

If an underlying disk sector for a logical database page is corrupted
on-disk then the associated CRC (cyclic redundancy check) hardware
checksums will detect the error. If it's a smart peripheral, it will
retire the bad block and revector it to somewhere else on the disk
before the data is corrupted.

> Of particular interest is the way that writes of large size pages (16kb,
> when the sector size is 512b), as well as updating single rows which
> span multiple DB pages.
>

I suggested to Ann duplicating the page generation field at the end of
page to detect page tears (mentioned in an earlier post by someone else)
as a minimal, cheap detection mechanism. I assume that you know the
maximum page size is 8KB, and not 16KB, but let's entertain the larger
page sizes because it's perfectly within the realm of future ODS
changes.

The problem occurs when a large database page is striped across multiple
disks under RAID; it's possible for the first disk sector and last disk
sector to complete. The intermediate sectors not written would not be
detected by the page generation id at the front and end of page. More
esoteric, storage expensive RAID configurations can probably help.

It also doesn't detect in-memory corruption (think about the recent
defect of multiple generators writing past end of page). IMHO, it's not
worth removing a 2 byte page header checksum and it might actually be
worth the flexibility to conditionally enable it if it doesn't effect
YOUR application's level of acceptable performance.

This detection mechanism will be invaluable in an SMP incarnation when
there is concurrent, multi-threaded access to the cache subsystem. In
development DEBUG mode you will want to enable checksums on in-memory
page fetchs (as opposed only to page disk reads/writes) to catch
synchronization bugchecks or roque memory writes.

BTW, even if checksum capability was re-enabled, I would argue with a
default of inactivated checksums. The performance is just too great and
the occurrence of checksum errors too infrequent to justify as the
default. It's just not worth eliminating the checksum from the ODS for 2
bytes/page especially if you ever wanted to use it again. Then you would
have to wait years for the next ODS change.

Regards,
Charlie