Subject | Re: [ib-support] Table size and indexing |
---|---|
Author | Andy Canfield |
Post date | 2001-06-19T00:17:43Z |
Significant amounts of this discussion are surprising me.
Suppose I have two tables:
CREATE TABLE Alpha (
AlphaIdent INTEGER NOT NULL PRIMARY KEY,
AlphaName VARCHAR(30));
CREATE TABLE Beta (
BetaIdent INTEGER NOT NULL PRIMARY KEY,
AlphaIdent INTEGER REFERENCES Alpha,
etc... );
Now as far as I know there are two indexes; an index for table Alpha on the values in Alpha.AlphaIdent and an index for table Beta on the values in Beta.BetaIdent. Your discussion seems to imply that there is another index on table Beta on values from column Beta.AlphaIdent. You talk about selectivity being low. If by selectivity you mean the ratio of the number of distinct values to the number of table rows, then the selectivity of a primary key index is always one hundred percent since primary keys must be unique. If I were to have an index on Beta.AlphaIdent then its selectivity could be much lower, yes. But I have not declared an index on Beta.AlphaIdent in the SQL DDL; was such a thing created automatically?
Apparently this discussion is saying that if table Alpha is small, say a dozen rows, database performance will improve by replacing "PRIMARY KEY" by "UNIQUE", that the index provided automatically by the "PRIMARY KEY" clause actually slows down the system if Alpha is small.
My main tables have lots of columns which are codes such as VarietyIdent (one letter), StationIdent (three letters), and ProblemIdent (two letters). These are foreign key references into the "reference" tables Variety, Station, and Problem. The Variety table has only two rows because there are only two legal variety codes, the Station table has 22 rows, the Problem table has 8 rows. These reference tables are set up at the beginning of the year and not changed during that year.
In the small reference tables should I change "PRIMARY KEY" to "UNIQUE", thus eliminating the index? Are the "REFERENCES" clauses in the major tables creating implied indexes that I don't need or want?
Suppose I have two tables:
CREATE TABLE Alpha (
AlphaIdent INTEGER NOT NULL PRIMARY KEY,
AlphaName VARCHAR(30));
CREATE TABLE Beta (
BetaIdent INTEGER NOT NULL PRIMARY KEY,
AlphaIdent INTEGER REFERENCES Alpha,
etc... );
Now as far as I know there are two indexes; an index for table Alpha on the values in Alpha.AlphaIdent and an index for table Beta on the values in Beta.BetaIdent. Your discussion seems to imply that there is another index on table Beta on values from column Beta.AlphaIdent. You talk about selectivity being low. If by selectivity you mean the ratio of the number of distinct values to the number of table rows, then the selectivity of a primary key index is always one hundred percent since primary keys must be unique. If I were to have an index on Beta.AlphaIdent then its selectivity could be much lower, yes. But I have not declared an index on Beta.AlphaIdent in the SQL DDL; was such a thing created automatically?
Apparently this discussion is saying that if table Alpha is small, say a dozen rows, database performance will improve by replacing "PRIMARY KEY" by "UNIQUE", that the index provided automatically by the "PRIMARY KEY" clause actually slows down the system if Alpha is small.
My main tables have lots of columns which are codes such as VarietyIdent (one letter), StationIdent (three letters), and ProblemIdent (two letters). These are foreign key references into the "reference" tables Variety, Station, and Problem. The Variety table has only two rows because there are only two legal variety codes, the Station table has 22 rows, the Problem table has 8 rows. These reference tables are set up at the beginning of the year and not changed during that year.
In the small reference tables should I change "PRIMARY KEY" to "UNIQUE", thus eliminating the index? Are the "REFERENCES" clauses in the major tables creating implied indexes that I don't need or want?
----- Original Message -----
From: "Helen Borrie" <helebor@...>
To: <ib-support@yahoogroups.com>
Sent: Monday, June 18, 2001 18:16
Subject: RE: [ib-support] Table size and indexing
> At 10:25 AM 18-06-01 +0100, you wrote:
>
> > >Making foreign keys on other tables that reference these small tables will
> >DEFINITELY slow >things down - IB will form long duplicate chains in the
> >index tree of the mandatory index >and wreck performance. Instead, make
> >composite indexes on such columns, combining the >lookup column + the
> >primary key.
> >
> >I'd just about finished designing the database for an analysis system I'm
> >developing when I read the above. Following is a (stripped down) table which
> >records responses to direct response advertising campaigns:
> >
> >CREATE TABLE "RESPONSES"
> >(
> > "NUM" "POSITIVEBIGINT" NOT NULL,
> > "CAMPNUM" "POSITIVEBIGINT" NOT NULL,
> > "REGNUM" "POSITIVESMALLINT",
> > "CONTMODENUM" "POSITIVEBIGINT" NOT NULL,
> > "ADNUM" "POSITIVEBIGINT",
> >CONSTRAINT "RESP_PK" PRIMARY KEY ("NUM")
> >);
> >
> >CAMPNUM refers to the advertising campaign that generated the response.
> >CAMPNUM is likely to be in the hundreds (or very low thousands).
>
> Primary table in the structure?
>
> >REGNUM refers to the region of the country from which the response came.
> >REGNUM is about 12 and not likely to change.
>
> Selectivity is too low to justify an index for sorting or joining purposes.
>
> >CONTACTMODENUM refers to the method of contact used (e.g. web, phone).
> >CONTACTMODENUM has only 2 values currently and is not likely to change much,
> >say 10 or below.
>
> Same as above if you have these defined in a control table.
>
> >ADNUM refers to the advertisement (TV commercial, press ad, radio ad) that
> >generated the response. ADNUM will be in the thousands.
>
> Foreign key relationship and its index are both justified.
>
> >All these columns are implemeted as FK's to ensure data integrity and to
> >make use of the IB generated indexes.
>
> I don't see why the first two tables need to referenced to ensure data integrity. The cost of those indexes is large; if you need data integrity triggers, write your own. The IB-generated indexes will be terrible. If you want these columns indexed in the referring table for ordering purposes, define your own composite indexes, viz. low-integrity key + pri key to raise their selectivity.
>
> >Also, this keeps the design clean and
> >understandable.
>
> /* Comments */ have the same effect without forcing you to accept engine-style foreign key constraints that kill performance because of their low selectivity. You can have foreign keys in your database without using the db engine to enforce it!
>
> >Are you saying that none of these should be implemented as
> >FK's?
>
> Adnum is the only eligible FK I see there.
>
> >CONTACTMODENUM could be a domain with a check on the content, but
> >since they are likely to change I would rather not have to change the
> >database structure just to add a new contact method.
>
> Why would you have to change the database structure to add a new contact method ????? If it's a control table, you only need to add a new row.
>
>
> >What would be the best approach in the above 4 cases, from a performace
> >viewpoint? What's the scale of the performane 'hit' with the above approach?
> >If defining FK's as FK's produces a noticable performance penalty, is that
> >unique to IB?
>
> No, it's not unique to IB, just particularly noticeable in IB. As for why, and for the scale of the performance hit, have a look at Ann's posting...
>
> Cheers,
> Helen
>
> All for Open and Open for All
> InterBase Developer Initiative ยท http://www.interbase2000.org
> _______________________________________________________
>
> To unsubscribe from this group, send an email to:
> ib-support-unsubscribe@egroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>