Subject Re: [ib-support] Full join unusably slow
Author Helen Borrie
At 12:05 PM 17-10-02 +1000, you wrote:
>Hi,
>
>I was trying to add a field to indicate whether the user had selected a
>field or
>not. To make it transient and not takeup space in the table I added a table
>sm_selected with one field called id. The presence of value in sm_selected
>means
>that the corresponding record in sawmembers has been selected by the user
>and is
>hightlighted on his screen.
>
>The join query:
>select auto_number, id, station_id from sawmembers
>full join sm_selected on id = auto_number
>where status < 3
>order by station_id
>
>takes 100 s and returns, 38343 records

It is joining all records in both tables.


>The old-style query:
>select auto_number, station_id from sawmembers
>where status < 3
>order by station_id
>
>took 5 seconds

It isn't even interested in records where status >=3.


>select count(*) from sawmembers gives 426767
>
>The script to create the sm_selected table is:
>
>CREATE TABLE SM_SELECTED (
> ID INTEGER NOT NULL PRIMARY KEY
>)
>
>Any ideas about getting the speed back? I don't see a fundamental reason
>for it
>to take so long. It must be doing things in the wrong order.

My take is that your full join (not required) is defeating all other
optimization. To verify this, run your query through a tool with a monitor
and a profiler (such as IB_SQL).

Add to this that you have used ambiguous syntax and I'd be surprised if the
optimizer was anything but totally confused.

You could try

select sm.auto_number, sm.id, sm.station_id from sawmembers sm
join sm_selected ss on ss.id = sm.auto_number
where sm.status < 3
order by sm.station_id

But, since you are not returning any values from sm_selected, you might
like to see whether this works faster (and settle the itchiness I tend to
suffer when seeing joined tables that don't return any columns :-) ):

select sm.auto_number, sm.id, sm.station_id from sawmembers sm
where sm.status < 3
and (exists (select ss.id from sm_selected ss where ss.id = sm.auto_number))
order by sm.station_id

Contrary to Sean's advice, I would avoid indexing sm.status if it has a
very small distribution of different values in sm. That will make things
worse, not better. However, do make sure you have an index on station_id
if it has a reasonable or better distribution.

heLen