firebird-support - Re: [firebird-support] Exclude records from index

Subject	Re: [firebird-support] Exclude records from index
Author	Ann Harrison
Post date	2011-06-14T14:27:20Z

On Tue, Jun 14, 2011 at 2:00 AM, Kjell Rilbe <kjell.rilbe@...> wrote:

>>
>> I have a log table with millions of records, but maybe only 5-10% have a
>> value in this specific field. And if I ever do a query using this value,
>> only records with values <> NULL or 0 are relevant.
>
> I also have a few of those. I have two tables with 150 million records
> each, and for a couple of associations (in UML terms) I have rather few
> "entries" but for those that exist, having an index for joins is
> essential. So, a few indices on these large tables have perhaps 98% NULL:s.

How big are those indexes? (gstat will tell you) It seems to me there are
two or possibly three concerns. The marginal concern is the cost of maintaining
index entries for all the null values, which is negligible unless
they're very volatile.
The less serious concern is the space itself - wasting all those valuable pages
on disks that now cost, let me see, $100/terabyte or so. The more serious
concern is the depth of the indexes and the related cost of access. That one
is worth measuring against the cost of adding a second table.

So, here's the way you figure that out. A second table, however small, will
have at least an index root page and a pointer page, plus a few data pages,
plus a few index pages, all of which will have to be read from disk on first
reference and hopefully cached. The large index may be as much as four
levels deep, but all the significant pages are clustered together and take
up about as much space as the index on the smaller table. Index lookups
start at the top, so you'll need to keep the whole path from the top in
cache, but even if the big index is four levels and the small is two, that's
two extra pages.

So, for the "second table" solution, you've added code complexity to
your application, plus the need to keep the index root, pointer page,
and data pages in cache as opposed to keeping two index pages around.

Good luck,

ann