Subject Re: [firebird-support] Fuzzy / Full-text/ Soundex searching...
Author Ivan Prenosil
> I don't have any experience with Fuzzy / Full-text/ Soundex searching,
> and I'm kind of wondering what the difference is between these three.

Full-text searching is generally method for fast locating of documents.
Simple scanning of 1'000'000 documents to find relevant words
can take half an hour, while using fulltext search will take half a second.

Soundex is method for matching words based on how they sound,
and its usage has sense only for "problematic" languages (like English)
where written and spoken forms of words are different.
It is important to know that the algorithm is different for each language.
(I tried to use English-soundex on Czech data, and the result
was as expected, i.e. unusable)

"Fuzzy search" means finding "similar" words, but the term is fuzzy itself,
because it is beeing used for several different methods.
(e.g. some people use the term fuzzy search when they use thesaurus)
The method I know/use splits each word to bigrams or trigrams
and then returns those words that match to at least x percent.
This method is very efficient for ignoring spelling/typing mistakes,
and also for languages that "bend" words (like Czech).


Soundex is usually very simple addition to basic fulltext search
(you just convert each word to code, and then compare these
codes for equality). Good fuzzy search is much more complex,
but also more efficient (imo).


> What would you recommend in this situation?

Your requirements are quite fuzzy :-) You have not specified
volume of your data, volume of daily updates, whether you index
data e.g. once a day or in real time, which platforms you want
to support, and which features you need (exact match,
wildcards, soundex, fuzzy search, indirect search, stop list, normalizing
words, ordering by relevance, searching several fields/tables at once,
stupid-query-resistance, etc).
E.g. for very low volume of data (hundreds of documents)
simple UDF that just scans all blobs (i.e. without indexing)
can perform quite well.

Ivan
http://www.volny.cz/iprenosil/interbase/


----- Original Message -----
From: "Jonathan Neve" <jonathan@...>
To: "firebird-support" <firebird-support@yahoogroups.com>
Sent: Friday, October 29, 2004 5:13 PM
Subject: [firebird-support] Fuzzy / Full-text/ Soundex searching...


>
> Hi all,
>
> I don't have any experience with Fuzzy / Full-text/ Soundex searching,
> and I'm kind of wondering what the difference is between these three.
> I'm also looking for a tool that can allow me to do this sort of thing
> with FB/IB (I've found several, but I'd like your advice).
>
> What I need, is to be able to look up in my table all records similar to
> a certain string (type in by the end-user), allowing for spelling
> mistakes, and such like. It has to sound similar, but not necessarily
> exactly the same. Also, the text will be French, so it has to work
> correctly with this.
>
> What would you recommend in this situation?
>
> Thanks!
> Jonathan Neve.