firebird-architect - Re: Regarding the new SIMILAR TO in FB 2.5

Subject	Re: Regarding the new SIMILAR TO in FB 2.5
Author	dmarmur2002
Post date	2008-07-20T15:33:29Z

--- In Firebird-Architect@yahoogroups.com, "Leyne, Sean" <Sean@...> wrote:

>
> Dany,
>
> > When doing advanced work concerning language research and phonetics
> > regexps are extensively used. Amongst a plethora of advanced
> > validation tasks we use regexp for manipulating strings. In this
> > scenario it is extremely important not only to be able to tell if
> > there is a match, but also where in the string the (first) match was
> > found.
> >
> > Thus, in the expression
> >
> > 'banana' SIMILAR TO 'a+'
> >
> > we would need to have a return value of 2, not merely true
> >
> > 'banana' SIMILAR TO 'c+' could return 0.
> >
> > In the release notes for FB 2.5 I cannot see that this is possible.
> > One could also, in order to maintain the current design, wish for an
> > alternate predicate such as SIMILAR POS.
> >
> > Would this be possible?
>
> I don't think it is appropriate/necessary to complicate the current
> SIMILAR TO functionality.

OK, sorry. I was a bit to eager with my last post and should have
checked things up a bit more. I agree that the implementation of the
predicate SIMILAR TO should perhaps be left alone at this stage.

But I will continue to argue the need to know either *what* or *where*
matches occur. When performing string manipulation it is of course of
essence to be able to know where in a string one needs to split or
replace.

There should be a build-in function that uses the implementation of
SIMILAR TO to serve those of us who needs build string manipulation
routines on regexp functionality. Today you can only do manipulation
based on exact matches. This might be fine for such fuzzy algorithms
as soundex and metaphone, but it is not enough for serious language
and text analysis.

PostgreSQL delivers functions for this in two different ways.
I hope it's ok to refer to the chapter "9.7. Pattern Matching" in the
manual pages. Complementing SIMILAR TO is the ability to use pattern
matching in the SUBSTRING function. That would actually be enough for
my clients needs.

PostgreSQL goes on and delivers functions like regexp_replace,
regexp_matches and more. The manual pages indicates that the
underlying implementation is different from the SQL-based one used in
SIMILAR TO and SUBSTRING so IMHO I think we can live without those
extra functions.

If SUBSTRING could be used with regexp patterns one could take the
result from that function and put into the POSITION function and go on
from there. The nice twist of that implementation is that since
SUBSTRING returns a string I am also able to find out the length of
the match.

>
> Nothing prevents you from implementing your own SIMILAR TO function via
> a UDF.
>

This is the way it is done today. There are a lot of problems with
this approach. First of all the need to have a consistent
implementation across platforms, server models *and* character
sets/collations.

FB 2.1 has been extended with a set of build-in functions
that renders the need of UDFs a more obsolete. IMHO avoiding UDF
significantly reduces the development cost of a project which is why I
was extremely happy to notice these additions.

>
> > Another important thing that the release notes does not mention is
> > being able to use variables in PSQL on both sides of the new
> > predicate(s).
>
> That is because as a predicate, all of the existing rules for the
> handling of variables apply. There is nothing special about SIMILAR TO.

This is of course excellent :=)

Regards,

/Dany