Subject RE: left join / first row select
Author

 Dear Svein,


First thanks for your feedback.

As I mentioned in my first email I cannot use "distinct" instruction in my query.


Here is example of the logs data taken from flamerobin (in the table is 33k rows): 


Preparing query: SELECT a.CON_ID, a.* FROM XRD_RC_DATA_VIEW_F a

Prepare time: 3.338s

Field #01: XRD_RC_DATA_VIEW_F.CON_ID Alias:CON_ID Type:INTEGER

Field #02: XRD_RC_DATA_VIEW_F.CON_ID Alias:CON_ID Type:INTEGER

Field #03: XRD_RC_DATA_VIEW_F.MIN_FWHM_GAN_002 Alias:MIN_FWHM_GAN_002 Type:FLOAT

Field #04: XRD_RC_DATA_VIEW_F.AVG_FWHM_GAN_002 Alias:AVG_FWHM_GAN_002 Type:DOUBLE PRECISION

Field #05: XRD_RC_DATA_VIEW_F.MAX_FWHM_GAN_002 Alias:MAX_FWHM_GAN_002 Type:FLOAT

Field #06: XRD_RC_DATA_VIEW_F.COUNT_FWHM_GAN_002 Alias:COUNT_FWHM_GAN_002 Type:INTEGER

Field #07: XRD_RC_DATA_VIEW_F.MIN_FWHM_ALN_002 Alias:MIN_FWHM_ALN_002 Type:FLOAT

Field #08: XRD_RC_DATA_VIEW_F.AVG_FWHM_ALN_002 Alias:AVG_FWHM_ALN_002 Type:DOUBLE PRECISION

Field #09: XRD_RC_DATA_VIEW_F.MAX_FWHM_ALN_002 Alias:MAX_FWHM_ALN_002 Type:FLOAT

Field #10: XRD_RC_DATA_VIEW_F.COUNT_FWHM_ALN_002 Alias:COUNT_FWHM_ALN_002 Type:INTEGER

Field #11: XRD_RC_DATA_VIEW_F.MIN_FWHM_GAN_102 Alias:MIN_FWHM_GAN_102 Type:FLOAT

Field #12: XRD_RC_DATA_VIEW_F.AVG_FWHM_GAN_102 Alias:AVG_FWHM_GAN_102 Type:DOUBLE PRECISION

Field #13: XRD_RC_DATA_VIEW_F.MAX_FWHM_GAN_102 Alias:MAX_FWHM_GAN_102 Type:FLOAT

Field #14: XRD_RC_DATA_VIEW_F.COUNT_FWHM_GAN_102 Alias:COUNT_FWHM_GAN_102 Type:INTEGER

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A A NATURAL)



Executing...

Done.

0 fetches, 0 marks, 0 reads, 0 writes.

0 inserts, 0 updates, 0 deletes, 127020 index, 960 seq.

Delta memory: -72 bytes.

Total execution time: 14.561s

Script execution finished.

 

If I use DISTINCT


Preparing query: SELECT DISTINCT a.CON_ID, a.* FROM XRD_RC_DATA_VIEW_F a

Prepare time: 3.962s

Field #01: XRD_RC_DATA_VIEW_F.CON_ID Alias:CON_ID Type:INTEGER

Field #02: XRD_RC_DATA_VIEW_F.CON_ID Alias:CON_ID Type:INTEGER

Field #03: XRD_RC_DATA_VIEW_F.MIN_FWHM_GAN_002 Alias:MIN_FWHM_GAN_002 Type:FLOAT

Field #04: XRD_RC_DATA_VIEW_F.AVG_FWHM_GAN_002 Alias:AVG_FWHM_GAN_002 Type:DOUBLE PRECISION

Field #05: XRD_RC_DATA_VIEW_F.MAX_FWHM_GAN_002 Alias:MAX_FWHM_GAN_002 Type:FLOAT

Field #06: XRD_RC_DATA_VIEW_F.COUNT_FWHM_GAN_002 Alias:COUNT_FWHM_GAN_002 Type:INTEGER

Field #07: XRD_RC_DATA_VIEW_F.MIN_FWHM_ALN_002 Alias:MIN_FWHM_ALN_002 Type:FLOAT

Field #08: XRD_RC_DATA_VIEW_F.AVG_FWHM_ALN_002 Alias:AVG_FWHM_ALN_002 Type:DOUBLE PRECISION

Field #09: XRD_RC_DATA_VIEW_F.MAX_FWHM_ALN_002 Alias:MAX_FWHM_ALN_002 Type:FLOAT

Field #10: XRD_RC_DATA_VIEW_F.COUNT_FWHM_ALN_002 Alias:COUNT_FWHM_ALN_002 Type:INTEGER

Field #11: XRD_RC_DATA_VIEW_F.MIN_FWHM_GAN_102 Alias:MIN_FWHM_GAN_102 Type:FLOAT

Field #12: XRD_RC_DATA_VIEW_F.AVG_FWHM_GAN_102 Alias:AVG_FWHM_GAN_102 Type:DOUBLE PRECISION

Field #13: XRD_RC_DATA_VIEW_F.MAX_FWHM_GAN_102 Alias:MAX_FWHM_GAN_102 Type:FLOAT

Field #14: XRD_RC_DATA_VIEW_F.COUNT_FWHM_GAN_102 Alias:COUNT_FWHM_GAN_102 Type:INTEGER

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC3 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN (A DATA_XRD_RC ORDER IDX_DATA_XRD_RC9 INDEX (IDX_DATA_XRD_RC1, IDX_DATA_XRD_RC4))

PLAN SORT ((A A NATURAL))



Executing...

Done.

0 fetches, 0 marks, 0 reads, 0 writes.

0 inserts, 0 updates, 0 deletes, 5761198 index, 33335 seq.

Delta memory: 121544 bytes.

Total execution time: 0:03:51 (hh:mm:ss)

Script execution finished. 


As you see distinct (3:51 min vs 15sec) really slow down query.  I have impression I use all needed indexing as so on. 

That is why I was looking for option where I can select only first row from the query:


>SELECT * FROM DATA_SUMMARY a
>left join

>(select first 1 * from DATA_VIEW) as c on c.CON_ID_2=a.CON_ID_1  


unfortunately it doesn't work. 

If you have any other suggestions please let me know. 

Regards
Andrzej

--- In firebird-support@yahoogroups.com, <svein.erling.tysvaer@...> wrote:

>I try to solve this problem already few days without success.
>I am using WI-V2.5.2.26540 Firebird 2.5 (windows 7).
>I have two tables:
>
>TABLE 1:
>DATA_SUMMARY:
>CON_ID_1 Data1
>1 A
>2 Bb
>3 Dd
>4 EE
>…
>1000 ABC
>
>TABLE 2:
>DATA_VIEW
>CON_ID_2 Data2 Data3
>1 D1 D1
>1 D1 D1
>2 D2 D2
>2 D2 D2
>2 D2 D2
>3 D3 Z3
>3 D3 Z3
>4 D4 Z4
>5 D5 Z5
>
>As a results I would like to have
>
>Results:
>CON_ID_1 Data1 CON_ID_2 Data2 Data3
>1 A 1 D1 D1
>2 Bb 2 D2 D2
>3 Dd 3 D3 Z3
>4 EE 4 D4 Z4
>…
>1000 ABC NULL NULL NULL
>
>So idea is I combine TABLE 1 and TABLE 2 based on CON_ID_1 and CON_ID_2 but in table2 I can have few identical rows with the same CON_ID_2 value.
>
>I tried to use such query
>
>SELECT * FROM DATA_SUMMARY a
>left join
>(select first 1 * from DATA_VIEW) as c on c.CON_ID_2=a.CON_ID_1
>
>unfortunetly it gives only NULL in the data_view tables part.
>
>I cannot use DISTINCT(c.CON_ID_2) option because it take 2.5 min to sort complete table (both tables have few thousand rows)
>
>On the internet I found only solutions for oracle and mysql but it doesn't for FB
>
>Any suggestions how to solve such problem?

Hi Kukiejko, sorry to hear you've already used days to solve this question.

A few thousand rows shouldn't normally take 2.5 minutes. Do you have an index for c.CON_ID_2?

From your problem description, I would say that the solution (provided you have indexes for CON_ID_1 or CON_ID_2) is as simple as:

SELECT DISTINCT a.CON_ID_1, a.Data1, c.CON_ID_2, c.Data2, c.Data3
FROM DATA_SUMMARY a
JOIN DATA_VIEW c on c.CON_ID_2=a.CON_ID_1

Now, I doubt that is the solution to your problem, but that is because I do not think you are telling us everything needed to solve it and the result of this query should be correct if the tables only contained the test data you provided. To solve your real problem (provided it is not as simple as I wrote above), please tell us the table structure including indexes and provide test data where your desired output differs from the output you would get from the select I wrote in the paragraph above.

A more likely answer to what you are wondering about would be something like (you may replace RDB$DB_KEY with the primary key of the table):

SELECT * FROM DATA_SUMMARY a
left join DATA_VIEW c on c.CON_ID_2=a.CON_ID_1
where not exists(select * from DATA_VIEW c2 where c.CON_ID_2 = c2.CON_ID_2 and c.RDB$DB_KEY > c2.RDB$DB_KEY)

but I am still not certain what your question really is, so it is rather flukey if this is the answer you're after.

Hope this either helps or results in a question that will lead to a better answer,
Set