Subject | Select/index efficiency question |
---|---|
Author | Daniel G. Wilson |
Post date | 2003-11-29T01:58:21Z |
After reading all the threads lately about search efficiencies and indices, which I have been following avidly as I have been fighting select & insert efficiency in my own applications, I took another look at my databases, and found an index in one table with many millions of duplicate records. In light of the slow insertion speeds I have been seeing, and the comments expressed, I decided to try to fix this. Here are the relevant portions of the two tables involved:
RECREATE TABLE STOCKTABLE
(
STOCK_ID BIGINT NOT NULL ,
SYMBOL SYMBOL NOT NULL,
....
CONSTRAINT STOCKTABLEPRIMARYKEY1 PRIMARY KEY (STOCK_ID)
);
RECREATE TABLE STOCKPRICETABLE
(
STOCKPRICE_ID BIGINT NOT NULL ,
STOCK_ID BIGINT NOT NULL ,
....
CONSTRAINT PRIMARY_STOCKPRICE_ID PRIMARY KEY (STOCKPRICE_ID)
);
The stock table currently has about 2,000 rows, and is not expected to grow beyond about 15,000 rows.
The stockpricetable currently has about 20 million rows, and is expected to grow to about 1.5 billion rows.
Originally, I had a foreign key relationship between stockpricetable.stock_id and stocktable.stock_id. To increase insert speeds, after reading the comments here, I removed the foreign key relationship and load the stock table into memory in the data-import application at application start, then do in-memory lookups based upon stock symbols via an AVL tree to get stock_id values to feed to the stockpricetable insert operations. This has greatly helped the data import speed.
However, the applications that use the database frequently want to select records from the stockpricetable based upon the stock_id:
SELECT STOCKPRICE_ID FROM STOCKPRICETABLE WHERE STOCK_ID = :STOCK_ID;
So I created an index:
CREATE ASC INDEX STOCK_PRICE_STOCK_INDEX ON STOCKPRICETABLE( STOCK_ID);
The above select then used this plan: PLAN(STOCKPRICETABLE INDEX(STOCK_PRICE_STOCK_INDEX)) and ran quite quickly.
This index has many millions of duplicates, as a single stock id is found in many tens of thousands of rows in the stockpricetable. Data inserts are slowed significantly by this index: removing it almost doubled import speed. Without it, though, selects are incredibly slow, as one would expect:
PLAN(STOCKPRICETABLE NATURAL)
After reading the latest threads, and thinking I understood them, I tried this:
CREATE ASC INDEX STOCK_PRICE_STOCK_INDEX ON STOCKPRICETABLE(STOCKPRICE_ID,STOCK_ID);
However, the plan selected is still STOCKPRICETABLE NATURAL.
So clearly, I have misunderstood the heart of the select query efficiency conversation that has been going on here. If some expert could please enlighten me as to what I am missing, I would greatly appreciate it! Or should I just leave the original index in place, inactivate it at the start of a bulk-insert, and then reactivate it at the end of the insert, and live with the millions of duplicates?
TIA,
Dan.
RECREATE TABLE STOCKTABLE
(
STOCK_ID BIGINT NOT NULL ,
SYMBOL SYMBOL NOT NULL,
....
CONSTRAINT STOCKTABLEPRIMARYKEY1 PRIMARY KEY (STOCK_ID)
);
RECREATE TABLE STOCKPRICETABLE
(
STOCKPRICE_ID BIGINT NOT NULL ,
STOCK_ID BIGINT NOT NULL ,
....
CONSTRAINT PRIMARY_STOCKPRICE_ID PRIMARY KEY (STOCKPRICE_ID)
);
The stock table currently has about 2,000 rows, and is not expected to grow beyond about 15,000 rows.
The stockpricetable currently has about 20 million rows, and is expected to grow to about 1.5 billion rows.
Originally, I had a foreign key relationship between stockpricetable.stock_id and stocktable.stock_id. To increase insert speeds, after reading the comments here, I removed the foreign key relationship and load the stock table into memory in the data-import application at application start, then do in-memory lookups based upon stock symbols via an AVL tree to get stock_id values to feed to the stockpricetable insert operations. This has greatly helped the data import speed.
However, the applications that use the database frequently want to select records from the stockpricetable based upon the stock_id:
SELECT STOCKPRICE_ID FROM STOCKPRICETABLE WHERE STOCK_ID = :STOCK_ID;
So I created an index:
CREATE ASC INDEX STOCK_PRICE_STOCK_INDEX ON STOCKPRICETABLE( STOCK_ID);
The above select then used this plan: PLAN(STOCKPRICETABLE INDEX(STOCK_PRICE_STOCK_INDEX)) and ran quite quickly.
This index has many millions of duplicates, as a single stock id is found in many tens of thousands of rows in the stockpricetable. Data inserts are slowed significantly by this index: removing it almost doubled import speed. Without it, though, selects are incredibly slow, as one would expect:
PLAN(STOCKPRICETABLE NATURAL)
After reading the latest threads, and thinking I understood them, I tried this:
CREATE ASC INDEX STOCK_PRICE_STOCK_INDEX ON STOCKPRICETABLE(STOCKPRICE_ID,STOCK_ID);
However, the plan selected is still STOCKPRICETABLE NATURAL.
So clearly, I have misunderstood the heart of the select query efficiency conversation that has been going on here. If some expert could please enlighten me as to what I am missing, I would greatly appreciate it! Or should I just leave the original index in place, inactivate it at the start of a bulk-insert, and then reactivate it at the end of the insert, and live with the millions of duplicates?
TIA,
Dan.