Subject SELECT with CTE and LIST: Duplicates with DISTINCT and incorrect ordering in the main select
Author Svein Erling Tysvær

Yesterday I tried to select using a CTE containing LIST(DISTINCT), and then the main (outer) select tried to group by this LIST. The result ended up with incorrect ordering as well as duplicates. Trying to make a reproducible test case, I didn’t get the duplicates, but rather the list contained duplicate values. I did a search to see if others have observed the same thing, but didn’t find anything (though I’m far better with Firebirds SELECT statement than internet searching, so this could well be just me).

 

Here’s the query (“reproducible” test case):

 

with tmp(id, greeting) as

(select list(distinct 'I should go last'), list(distinct substring('Hi' from 1 for 5))

from rdb$database

left join rdb$database r2 on (1=1)

union

select list(distinct 'I shouldn''t duplicate'), list(distinct substring('Hello' from 1 for 5))

from rdb$database

left join rdb$database r2 on (1=1)

union

select list(distinct 'I shouldn''t duplicate'), list(distinct substring('Hello' from 1 for 5))

from rdb$database

left join rdb$database r2 on (1=1))

select greeting, list(distinct id), count(distinct id)

from tmp

group by 1

order by Greeting

 

Expected result:

Hello     I shouldn’t duplicate      1

Hi            I should go last 1

 

Actual result (on Firebird 2.5.4, probably with ISO8859_1 as the character set and may actual query may have had NO_NO as the collation):

Hi            I should go last 1

Hello     I shouldn’t duplicate,I shouldn’t duplicate          2

 

How the wrong result would have been if I had been able to completely make a copy of my original error (random sorting and lots of duplicates):

Hi            I should go last 1

Hello     I shouldn’t duplicate      1

Hello     I shouldn’t duplicate      1

 

I wrote the query to answer one particular question, so I could get a similar result to what I wanted by first running only the content of the CTE and then using PivotTable in Excel, but I was very surprised by bumping into what I consider being two separate errors inside one query (duplicates and ordering), or maybe even 2,5 errors since my original query returned duplicate rows rather than duplicates within LIST(DISTINCT…).

 

Does anyone have similar experiences or an explanation?

Set