Subject | Indexing for OLAP |
---|---|
Author | Richard Wesley |
Post date | 2006-04-06T16:45:54Z |
We are using an embedded version of Firebird to copy data from large
data warehouses to the user's local machine to allow them to do "off-
line" data analysis. One of the things I am trying to figure out is
some sort of optimal indexing strategy for generic data.
My first pass only indexed "dimensional" columns (dates, strings with
- most of the discussions here seemed to suggest that was all that
the optimiser would bother using anyway;
- testing suggested that extraneous indexes increased file size and
reduced performance;
I am starting to see performance issues because we do our filtering
via joins. It looks like some indexes would be useful for join
columns. One test gives a 5x improvement by adding an index to a
column with only 40 values in a 1Mrow table - a selectivity close to 0!
In this example, the plan uses the index to sort the data, so it
looks like the optimiser is set up to Do The Right Thing here. My
question is, how does the optimiser choose indexes for joins/sorts?
All the discussions I have seen focus on using it in selects, but it
looks like my heuristics need to have a different focus.
TIA
________________________________________________________
Richard Wesley Software Engineer
Tableau Software
http://www.tableausoftware.com/ hawkfish
tableausoftware com
data warehouses to the user's local machine to allow them to do "off-
line" data analysis. One of the things I am trying to figure out is
some sort of optimal indexing strategy for generic data.
My first pass only indexed "dimensional" columns (dates, strings with
>20% selectivity) because:- our filtering tends to be dimensional;
- most of the discussions here seemed to suggest that was all that
the optimiser would bother using anyway;
- testing suggested that extraneous indexes increased file size and
reduced performance;
I am starting to see performance issues because we do our filtering
via joins. It looks like some indexes would be useful for join
columns. One test gives a 5x improvement by adding an index to a
column with only 40 values in a 1Mrow table - a selectivity close to 0!
In this example, the plan uses the index to sort the data, so it
looks like the optimiser is set up to Do The Right Thing here. My
question is, how does the optimiser choose indexes for joins/sorts?
All the discussions I have seen focus on using it in selects, but it
looks like my heuristics need to have a different focus.
TIA
________________________________________________________
Richard Wesley Software Engineer
Tableau Software
http://www.tableausoftware.com/ hawkfish
tableausoftware com