Subject Re: [ib-support] Full join unusably slow
Author Mark Patterson
Helen Borrie wrote:

> It is joining all records in both tables.

Why? What a wicked waste of valuable resources. I just want to basically the
same as the old query, but for each record look up SM_SELECTED and have it put
NULL in the ID field if the look up fails. It was doing that. The result set was
what I wanted.

> >The old-style query:
> >select auto_number, station_id from sawmembers
> >where status < 3
> >order by station_id
> >
> >took 5 seconds

>
>
> It isn't even interested in records where status >=3.

Nor should the new one.

>
>
>
> >select count(*) from sawmembers gives 426767
> >
> >The script to create the sm_selected table is:
> >
> >CREATE TABLE SM_SELECTED (
> > ID INTEGER NOT NULL PRIMARY KEY
> >)
>
> My take is that your full join (not required) is defeating all other
> optimization. To verify this, run your query through a tool with a monitor
> and a profiler (such as IB_SQL).

I'm not familiar with IB_SQL. I haven't got much out of profilers in the past.

>
> Add to this that you have used ambiguous syntax and I'd be surprised if the
> optimizer was anything but totally confused.

It's not ambiguous - the field name "ID" is unique to the SM_SELECTED table.
Once the query is parsed what difference does a lack of dots make? Is the
optimizer having a huff or something?

>
>
> You could try
>
> select sm.auto_number, sm.id, sm.station_id from sawmembers sm
> join sm_selected ss on ss.id = sm.auto_number
> where sm.status < 3
> order by sm.station_id

That's a different result set. That only returns records from SAWMEMBERS that
have a match on SM_SELECTED. I want to use whether ID is null to set the colour
of line on a DBGrid, in Delphi.

> But, since you are not returning any values from sm_selected, you might
> like to see whether this works faster (and settle the itchiness I tend to
> suffer when seeing joined tables that don't return any columns :-) ):
>
> select sm.auto_number, sm.id, sm.station_id from sawmembers sm
> where sm.status < 3
> and (exists (select ss.id from sm_selected ss where ss.id = sm.auto_number))
> order by sm.station_id

Hmmm... I think you ans Sean both misread my intentions. I was getting the right
result set from

select auto_number, id, station_id from sawmembers
full join sm_selected on id = auto_number
where status < 3
order by station_id

It would have been easier for humans to parse if I'd said

select SM.auto_number, SS.id, SM.station_id
from sawmembers SM
full join sm_selected SS on SS.id = SM.auto_number
where SM.status < 3
order by SM.station_id

> Contrary to Sean's advice, I would avoid indexing sm.status if it has a
> very small distribution of different values in sm. That will make things
> worse, not better. However, do make sure you have an index on station_id
> if it has a reasonable or better distribution.


It's interesting that you would say not to index status. The sawmember table
will grow to millions with most of the records being archived, (STATUS = 4) i.e.
not of interest to the factory controller. Thus
SELECT DISTINCT JOB_NUMBER FROM SAWMEMBERS WHERE STATUS <= 3
and similar queries need to exclude all the archived ones or they will take
ages. I've done timing tests on them.

I would like to know if it's possible to do what I was trying to do without the
100s wait. In the meantime we're procedding with a different approach.

Thanks for your help. I guess I didn't make my posting clear.

Regards
Mark Patterson