Subject Re: Questions about SELECT DISTINCT *
Author Svein Erling Tysvær
--- In firebird-support@yahoogroups.com, "robert_difalco" wrote:
> As you all know, a DISTINCT on many fields can be quite slow as it
> needs to order unique all the fields in each row. This can get
> really slow if there are large varchars in the result values.
>
> Normally when I perform a SELECT DISTINCT * what I really mean is
> SELECT DISTINCT uid.
>
> Is there a standard SQL way to just designate just ONE of the result
> values as DISTINCT rather than having them all be processed for
> distinct handling? Maybe somehow using GROUP BY or some such?
>
> Since I'm not very imaginative, the best I can think of is changing
> something like:
>
> SELECT DISTINCT * FROM Table [JOINS][WHERE];
>
> To:
>
> SELECT * FROM Table A WHERE A.uid IN
> SELECT DISTINCT uid FROM Table [JOINS][WHERE];
>
> Of course the only problem with this general pattern is that it wont
> work if I'm creating the Join to add extra result values that are
> not columns in "Table".
>
> Thoughts?
>
> R.

Yup, my first thought was "What a lousy idea, this won't even get him
the right result"! If RowA and RowB have the same UID, then both will
match the subselect and hence be returned. I cannot even think of a
case where the DISTINCT word of 'IN (SELECT DISTINCT...)' makes any
difference.

Hopefully, each row has a primary key. If so, then what you can do is:

SELECT * FROM TableA A WHERE NOT EXISTS(
SELECT * FROM TableA B WHERE B.UID = A.UID AND B.PK < A.PK)

For returning all records, I believe using SELECT DISTINCT is faster
than any alternative, though of course - the field in question must be
indexed (with low selectivity I also add the primary key to the
index). Though I wouldn't be surprised if the above select was quicker
to return the first few records.

HTH,
Set