Subject Re: How to create index for field in two tables and had any limitation
Author Adam
--- In firebird-support@yahoogroups.com, "Ann W. Harrison"
<aharrison@...> wrote:
>
> > <jeremylim2008@> wrote:
> >
> >> I am using firebird version 1.5, Delphi 7, bde and try
> >> to open tquery with below sql statement.
> >>
> >>"Select * From SalesOrder
> >> Left Join Customer on C.CustomerCode = S.CustomerCode
> >> order by customername "
> >>
> >>the customer records got 100, 000 records and sales order got
> >>around one million records.
> >>It took more than 2 minutes to open the query. as i know indexing
> >>will improve a lots the speed but indexing for 2 table I never try
before.
> >>
> >
> Adam wrote:
> >
> >
> > Indexing wont help a lot because you don't use a where clause on a
> > very large table.
>
> Adam is generally very knowledgeable about firebird, but he's having
> a bad day. Adding an index on the CustomerCode field in Customers
> will improve performance a lot. You're currently reading a million
> sales records and fore each sales record, you're reading 100,000
> customer records, for a total of 100,000,000,000 record reads. If
> you had an index on CustomerCode in Customer (that's the only field
> that will work) - you'd still read all 1,000,000 SalesOrder records
> but for each, you'd only look at three index pages (of which two are
> nearly certain to be in cache) and one Customer record. It's a huge
> win.
> >
>
> Regards,
>
>
> Ann
>

Hi Ann,

More dangers of not specifying anything about the database schema.

I assummed (incorrectly) that S.CustomerCode had a foreign key defined
and C.CustomerCode was a primary key and if that assumption was
correct, both fields would implicitly be indexed. I am not saying that
this is not true (I believe it is true), just that I had no right to
that assumption.

The PLAN I would expect from his original query IF the constraints are
defined as I expected would be:

PLAN SORT (JOIN (SALESORDER NATURAL,CUSTOMER INDEX (PK_CUSTOMERCODE)))

On the other hand, if (as you must suspect) these constraints are not
already defined, the PLAN will be.

PLAN SORT (JOIN (SALESORDER NATURAL,CUSTOMER NATURAL))

Which would be CTRL+ALT+DEL time for such a query. I doubt any
hardware available at any cost could run that in anywhere close to 2
minutes for tables of that size.

Neither of the above PLANs are sensible, but the first is (as you say)
a LOT better than the second. If he took my suggestion and changed the
left join to an inner join (which he probably can because a SalesOrder
probably requires a customer), I think the following plan should be
possible.

PLAN JOIN (CUSTOMER ORDER IX_CUSTOMERNAME,SALESORDER INDEX
(FK_CUSTOMERCODE))

If the client application had some data-aware grid that fetched only X
rows at a time, then such an approach could be reasonably fast
(providing they dont hit 'End').

Because it is using the index on CustomerName, this would allow the
results to start returning before without first retrieving and sorting
(most likely using disk with that volume of data). This means it may
be milliseconds before the first record is returned, rather than
minutes or hours.

However I maintain that it is pointless to deliver a grid with over a
million records in it. It would take several hours of pressing page
down to scroll through all the records. Not to mention the amount of
RAM you would need.

I must also point out that IF my suspicion about the CustomerCode
fields being primary and foreign keys is indeed true, then it is very
important that a second index is not defined. Apart from having no
benefit and taking up room in the database file, there are cases where
it has let to the optimiser choosing to use neither.

If you find yourself joining to a field, then it is a pretty good sign
that it should be a foreign key declaration.

Adam