Subject Re: Sorting and filtering by UDF result
Author Matt Bucknall
Hi Svein,

Interesting approach, although I was trying to avoid an 'indexing'
table that would be much larger than the original table, especially as
these tables will be quite large themselves.

In my original post, I perhaps oversimplified my problem, so I will
explain in more detail.

I need to search a number of columns in a table (this is generic, so
any columns in any table) counting up given keywords that could be
contained in one or more columns, at any position and then order the
results starting with rows that had the most keyword matches, whilst
filtering out any rows that had no matches.

The way I search the content of the columns is to concatenate their
contents for each row (replacing null entries with empty strings using
COALACE), passing that as the resulting string to F_MATCHCOUNT as its
first argument. The 2nd argument to F_MATCHCOUNT is a comma delimited
list of keywords to search the first argument for. The total count of
any of these keywords found in the string is returned by F_MATCHCOUNT.

The problem I have is that F_MATCHCOUNT does quite a lot of work, and
I am not happy about calling it twice in the SQL query (once to filter
out no match results, again to order the results). What I would like
to do is something like this:

SELECT table_name.*, F_MATCHCOUNT(concatenated_columns_string,
keywords) AS result FROM table_name WHERE result>0 ORDER BY result DESC;

..but of course, firebird does not recognise 'result' as an actual
column, so the query is illegal.

Thanks for the suggestions so far, sorry for not being clear to start
with!

Matt.



--- In firebird-support@yahoogroups.com, Svein Erling Tysvær
<svein.erling.tysvaer@...> wrote:
>
> Don't know if it helps, but you could try
>
> SELECT * FROM TABLE_NAME
> WHERE COLUMN_NAME CONTAINING 'foo'
> AND F_MATCHCOUNT(COLUMN_NAME, 'foo')>0
> ORDER BY F_MATCHCOUNT(COLUMN_NAME, 'foo') DESC;
>
> If you can put some logic like only being interested in words starting
> with 'foo' into your search, then you could add another table
> INDEXED_TABLE containing the two fields WORD and PK_TABLE_NAME, use
> triggers on TABLE_NAME to fill it and link to this one when doing your
> query. I guess you could do this even if you do not know whether 'foo'
> is in the beginning or middle of words, though the INDEXED_TABLE would
> be considerably bigger than the original table if a COLUMN_NAME
> contained e.g. 100 characters and you had to store this into 100
> records - one record where each position in the field where first in
> the string. Though it would be interesting to learn the result of such
> an exercise and it would make your F_MATCHCOUNT irrelevant:
>
> SELECT T.*, COUNT(DISTINCT IT.PK) FROM TABLE_NAME T
> JOIN INDEXED_TABLE IT ON T.PK = IT.T_PK
> WHERE IT.INDEXED_FIELD STARTING 'FOO'
> GROUP BY T.*
> ORDER BY 2
>
> HTH,
> Set
>
> --- In firebird-support@yahoogroups.com, "Matt Bucknall" wrote:
> > Hello,
> >
> > I have written a UDF ( F_MATCHCOUNT(string, pattern) ) that returns
> > the number of times a given pattern string occurs in another string.
> > Is there a more efficient way I can perform the following query on
> > Firebird 1.5, where I only need to call the UDF once?
> >
> > SELECT * FROM TABLE_NAME WHERE F_MATCHCOUNT(COLUMN_NAME, 'foo')>0
> > ORDER BY F_MATCHCOUNT(COLUMN_NAME, 'foo') DESC;
> >
> > Thanks,
> > Matt.
>