Subject Re: [firebird-support] Hypothetical near-match search
Author Alexandre Benson Smith
Hi Nigel !

Nigel Weeks wrote:

>Please excuse my ramblings, this is just an idea, and I'm wondering if
>people are doing something similar
>I'd like an indexing system equipped with Soundex and Metaphone codes, for
>fuzzy matching of words. I think I can do it.
>Here's how:
>/* A table containing every unique word in your database */
>CREATE TABLE srch$word (
> srch$id INT64 NOT NULL, /* A unique ID for each word */
> srch$word VARCHAR(200) NOT NULL, /* The word, in uppercase */
> srch$sdx VARCHAR(10), /* The Soundex Representation of the word */
> srch$meta VARCHAR(50), /* The MetaPhone Representation of the word */
> PRIMARY KEY(srch$id),
> UNIQUE(srch$word)
>CREATE GENERATOR gen_srch$word.srch$id;
>/* A table referencing every table, field, and row, and the word contained
>therein */
>CREATE TABLE srch$locations (
> rdb$relation_name CHAR(31) NOT NULL, /* Which table is it found in? */
> rdb$field_name CHAR(31) NOT NULL, /* Which field is it found in? */
> rdb$db_key CHAR(8), /* Which row is it found in? */
> srch$id INT64, /* Link to the word */
> PRIMARY KEY(rdb$relation_name,rdb$field_name,rdb$DB_KEY,srch$id)
> /* A foreign key here might just slow things down. */
>A small spider could then run over the records, populating the index with
>A possible search query could then be:
>SELECT rdb$relation_name, rdb$field_name, rdb_db_key FROM srch$locations LOC
> JOIN srch$word WORD
> ON LOC.srch$id = WORD.srch$id
>WHERE WORD.srch$word = 'Smith' OR WORD.srch$sdx = 'S520'
>'S530' is the soundex for the words: Saint, Sand, Sandy, Santee, Santi,
>Schmid, Schmidt, Schmit, Schmitt, Shand, Shumate, Sinnott, Smith, Smithey,
>Smoot, Smooty, Smyth, Smythe, etc, etc.
>If I could then work out a clever join to the live tables to actually
>retrieve the rows, I'd be cooking!
>My question is, can anyone see any shortcomings in this approach?
>Nigel Weeks
>Tech Support & Systems Developer
>71 - 75 Paterson Street
>Tasmania 7250
>Phone: 03 6336 7234

I thinks it's feasible...

I think in a search framework, this framework will accept parameters
(like googles or altavista search phrases), the will perform the search
on the word-index table.

The result will be a list of records that match the search criteria, you
will get the table name, record ID, and a possible ranking.

The second phase will be some kind of interface that will show the
appropriate record (for example an invoce form, or a customer edit form,
or something that will let the users see the real data), this way you
don't need to make a join to the real table.

The problems:
1.) You will have to make an automation facility to generate triggers
(inser/update/delete) for each monitored table, that will split the
record information into "words" and classify put this data on the
Word-Index table.
2.) You will need some way to classify (score, relevance, etc.) the
search result.
3.) You will need to create a kind of plug-in that could be extended for
new kinds of records (new tables) when it is created on the DB
4.) Besides Soundex and Metaphone, I think a kind of synonym table
should be created too, for example if a user search for index the words
indexes, indices, key, keys, etc. should be treated as similar.

I developed a kind of knowledge base that is in beta (internal use only)
for techinical articles. I think it's working like a charm, but it's
really simple and limited yet, and it has a fixed structure, no new
tables are added, and just a small number of tables are monitored. To
make it dinamic and even more better easily expanded, you will need some
tools to help you and generate triggers code for you.

Did you have looked at Lucene ( ?
Maybe it is just what I, you and a lot of people are looking for...
Didn't looked deep into it, don't know how it could be integrated with FB.

see you !


Alexandre Benson Smith
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil