Subject RE: [firebird-support] Hypothetical near-match search
Author Nigel Weeks
> I missed the start of this thread due to some e-mail
> problems, but I'm very intreseted in what might come out of it.
>
> About a year ago I investigated approximate matching a bit
> and tried to augment the Levenstein distance (or something
> similar) to include these
> operations:
>
> - add char (ab->acb)
> - remove char (abc->ac)
> - change char (a->b)
> - swap chars (ab->ba)
>
> It's the swap that I wanted to add. It wasn't all that
> difficult - I have a working(?) implementation in D7. But
> only the actual distance calculation mind you - nothing
> adapted for FB.
>
> I also wanted to create a similar algorithm but for phonetic
> matching in some way. SoundEx and similar algorithms are too
> weak. I would need to include all sounds in the strings but
> make mismatches in similar sounds, e.g. "g" and "k", have low
> weight in the comparison.
>
> Who started the thread? Could you keep me posted on your progress?

I started the thread, basically with an idea for an indexing system that
breaks out individual words from all tables, columns, and rows in your DB,
and stores:
The Uppercase version(So 'LIKE' operators can be used)
The Soundex Representation(It's sloppy, but it helps)
The MetaPhone representation(A bit better)

I'm considering doing the Levenstein ONLY on the matches returned from the
DB - not using Firebird to calculate the Levenstein. If someone's got a UDF
that'll do it, this might change!

If you jump into the firebird-support group on groups.yahoo.com, search for
'near-match', you should see all the conversations :-)

Nige.