Subject Re: [firebird-support] Question about indexes
Author David Johnson
There is no "right" answer, because both are correct under different circumstances. It is a matter of the specific problem to be solved. It's a place where the "favorite hammer" syndrome causes a lot of misunderstandings. If you have the database and the queries built, change the indeces and performance test your particular business logic.

In my designs, I tend to use the database as an object repository with a GUID (char 38)and a version number (integer) for the primary key. In this design paradigm, I rarely need any more indeces than the two column primary key, and RI is handled by the application framework rather than the DBMS so I don't have those considerations to impact performance.

However, when working in other designs I sometimes find a specific need for a compound key in frequently used queries that are a performance bottleneck. For example, we had a payroll query that had an exclusion clause based on three fields. Exclusion clauses are capital N Nasty in any DBMS because they depend on inverted logic [ for example: select ... where not (a in (select a from x where c=? and B=? and D=?)) ]. I am not sure that this would be legal in FB SQL (it works in DB2 and Access), but without a compound index this must do a table space scan. With a million records in the table, that is a million I/O, which will take a while. Add the compound index, and you go to three I/O on the index, and one per selected row on the target table. If you expect to return 1,000 rows, you are down to 1,003 I/O.

By comparison, the use of 3 simple indeces required roughly 300,000 I/O.

By creating a four field compound index with the three fields that were part of the exclusion and the field that was, coincidentally, the primary key of the related table that was being joined, the query's performance dropped from roughly 1 hour to 10 seconds. The DBMS only had to look at the index, not the record, so I/O was cut by a couple of orders of magnitude.

Indexes are tested using a left-to-right comparison mechanism written in C. A good compound index will have the search fields ordered from most distinct to least distinct, followed by the target column where applicable. In other words, if I have a table with a hundred columns, and I need to get a value (A) given three others (B, C, D), then I will build my index on columns B, C, D, A. This index would be good for :

Search on B to return C, D, and/or A
Search on B and C to return D, and/or A
Search on B, C and D to return A

But it would not be especially useful for anything else.

A well designed compound index will produce least I/O for both insert and select. A badly designed one will not aid anything, and may even cause unnecessary I/O.

Regardless of the DBMS, it is a matter of being able to do the math for the particular circumstances (and DBMS) under consideration and counting the number of times that you need to go to the physical storage. With modern hardware, allow 10 ms per I/O. Count the number of pages that need to be physically read to get the index entries, then count one page for each table entry. There are sometimes surprises as you go from one DBMS to another, but for the most part you can depend on "real work" queries (as opposed to artificial break me queries) to behave similarly across today's DBMS's.

I depend heavily on empirical testing, especially on something as easy to change as index structures. By testing empirically for the various situations, and asking the right questions, you can learn a lot about the innards of the system and tune your solution to the level that benefits your users the most.

----- Original Message -----
From: Russell Eva
Sent: Monday, March 08, 2004 6:13 AM
Subject: [firebird-support] Question about indexes

I keep on having this argument with everyone and it seems there are as many
documented proofs as there are people.

On a table with many columns what is recommended (in Firebird)
Many simple indexes (ie one field per index) to allow the RDBMS to work it
out for itself
Few complex indexes based on the majority of the querys fired against the
table, taking into consideration that a query where clause interrogating
columns a, b, and c may use an index comprising a, b, c, and d.

All documentation that I have read advise against indexes on columns that
have minimal values (eg active_flag (y/n) etc), and in my experience the
more indexes on a table, the worse the performance of inserts etc. I do
realize that this varies from DB to DB, but at this stage I am only
interested in FB.

My preference is to have a simple pk (ie a generator) (this also keeps fk's
simple) and then
Complex indexes based on table use.
What is the general feeling in this respect?

Yahoo! Groups Links

a.. To visit your group on the web, go to:

b.. To unsubscribe from this group, send an email to:

c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.

[Non-text portions of this message have been removed]