Subject | Re: [firebird-support] Slow performance with Index |
---|---|
Author | Ann W. Harrison |
Post date | 2005-02-03T17:06:16Z |
Edwin A. Epstein, III wrote:
across a network to a client application, so I'm going to assume that
whatever you're doing is happening on the server.
The elementary rules about indexes include not indexing fields that
aren't very selective - the example given is normally gender. There are
counter examples - a gender index is useful when searching for females
in the set of military officers prior to 1920. Oracle has a bitmap
index designed to help in indexing values with huge numbers of
duplicates. But you're using Firebird, so let me explain why this index
can be a good idea in a few cases and why it's a bad idea in other cases.
If you have very few (a few tens of thousand out of 14 million) records
with DNC_QUED = 1, and want to find those records, the index will help.
That's good , but doesn't generalize all that well, since from time to
time you will need to get the other records and the index will hurt
performance. You can tell Firebird not to use index in one of two ways:
invert the equality (i.e. DNC_QUED NOT = 1), or turn the value into an
expression (e.g. DNC_QUED + 0 = 0).
If you know that the records are clustered by value, the index will
improve performance by restricting the number of pages you need to read
What does that mean and how could you know that? When you insert a row,
Firebird puts it in the most convenient empty space. Space is
convenient if it is on a page that is currently in the cache, if it's on
a page allocated to the table and not already full, if it's on a page
allocated to the database but not currently in use. Otherwise, Firebird
extends the database file and allocates a new page to a table. If
you've got a volatile database - lots of records stored, modified, and
erased - records will tend to be all over the place. If your database
is more archival, mostly storing, very little modification or erasing,
then newer records will be at the end and older records at the beginning.
So, for a volatile database, records won't tend to be clustered by age,
but for an archival database, they will. If you know that older records
will have one value for DNC_QUED and newer records will have the other
value, then there's hope that the index will do you some good because
all the records with one value will be on one set of pages and all the
other pages will contain only records of the other value.
The cost of a retrieval is basically the number of pages you have to
read. If every page of the table has some data you need, a simple table
scan is faster than reading the index, constructing a bit map of the
records that match the criteria, then reading every page.
But in general, don't index two-valued fields.
Regards,
Ann
> I have a table with 14 million records in it. As part of certain operationsFirst, no professional programmer would ever send 7 million records
> I need to do against the records I flag them individually as part of a que.
> I am using a field called DNC_QUED which is a VarChar(1) and ASCII character
> set. I have non-unique index on just the field alone. There are no null
> values and the only values are '1' or '0'.
across a network to a client application, so I'm going to assume that
whatever you're doing is happening on the server.
The elementary rules about indexes include not indexing fields that
aren't very selective - the example given is normally gender. There are
counter examples - a gender index is useful when searching for females
in the set of military officers prior to 1920. Oracle has a bitmap
index designed to help in indexing values with huge numbers of
duplicates. But you're using Firebird, so let me explain why this index
can be a good idea in a few cases and why it's a bad idea in other cases.
If you have very few (a few tens of thousand out of 14 million) records
with DNC_QUED = 1, and want to find those records, the index will help.
That's good , but doesn't generalize all that well, since from time to
time you will need to get the other records and the index will hurt
performance. You can tell Firebird not to use index in one of two ways:
invert the equality (i.e. DNC_QUED NOT = 1), or turn the value into an
expression (e.g. DNC_QUED + 0 = 0).
If you know that the records are clustered by value, the index will
improve performance by restricting the number of pages you need to read
What does that mean and how could you know that? When you insert a row,
Firebird puts it in the most convenient empty space. Space is
convenient if it is on a page that is currently in the cache, if it's on
a page allocated to the table and not already full, if it's on a page
allocated to the database but not currently in use. Otherwise, Firebird
extends the database file and allocates a new page to a table. If
you've got a volatile database - lots of records stored, modified, and
erased - records will tend to be all over the place. If your database
is more archival, mostly storing, very little modification or erasing,
then newer records will be at the end and older records at the beginning.
So, for a volatile database, records won't tend to be clustered by age,
but for an archival database, they will. If you know that older records
will have one value for DNC_QUED and newer records will have the other
value, then there's hope that the index will do you some good because
all the records with one value will be on one set of pages and all the
other pages will contain only records of the other value.
The cost of a retrieval is basically the number of pages you have to
read. If every page of the table has some data you need, a simple table
scan is faster than reading the index, constructing a bit map of the
records that match the criteria, then reading every page.
But in general, don't index two-valued fields.
Regards,
Ann