firebird-support - Re: [firebird-support] Exclude records from index

Subject	Re: [firebird-support] Exclude records from index
Author	Kjell Rilbe
Post date	2011-06-14T14:36:06Z

Ann Harrison skriver:

> On Tue, Jun 14, 2011 at 2:00 AM, Kjell Rilbe <kjell.rilbe@...
> <mailto:kjell.rilbe%40datadia.se>> wrote:
>
> >>
> >> I have a log table with millions of records, but maybe only 5-10% have a
> >> value in this specific field. And if I ever do a query using this value,
> >> only records with values <> NULL or 0 are relevant.
> >
> > I also have a few of those. I have two tables with 150 million records
> > each, and for a couple of associations (in UML terms) I have rather few
> > "entries" but for those that exist, having an index for joins is
> > essential. So, a few indices on these large tables have perhaps 98%
> NULL:s.
>
> How big are those indexes? (gstat will tell you) It seems to me there are
> two or possibly three concerns. The marginal concern is the cost of
> maintaining
> index entries for all the null values, which is negligible unless
> they're very volatile.
> The less serious concern is the space itself - wasting all those
> valuable pages
> on disks that now cost, let me see, $100/terabyte or so. The more serious
> concern is the depth of the indexes and the related cost of access. That one
> is worth measuring against the cost of adding a second table.
>
> So, here's the way you figure that out. A second table, however small, will
> have at least an index root page and a pointer page, plus a few data pages,
> plus a few index pages, all of which will have to be read from disk on first
> reference and hopefully cached. The large index may be as much as four
> levels deep, but all the significant pages are clustered together and take
> up about as much space as the index on the smaller table. Index lookups
> start at the top, so you'll need to keep the whole path from the top in
> cache, but even if the big index is four levels and the small is two, that's
> two extra pages.
>
> So, for the "second table" solution, you've added code complexity to
> your application, plus the need to keep the index root, pointer page,
> and data pages in cache as opposed to keeping two index pages around.

Intersting reading. Seems to me that you ended up saying for RAM usage
and select performance considerations it is better to keep the large index?

Seems reasonable.

But you missed one consideration, that may actually be of some
significance: backup time and backup space (which we rent per Gbyte).

Anyway, we're going to stick with the current solution until we find
"real" problems with it. It works ok as it is.

Still, it would seem like a "nice-to-have" to be able to exclude nulls
from an index. I assume this would mena that queries that needs nulls
(possibly in combo with non-nulls) would be executed as if the index
didn't exist, and queries that needs only non-nulls would be executed
with the index included when creating the execution plan.

Regards,
Kjell
--
------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell.rilbe@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

[Non-text portions of this message have been removed]