Subject Re: Index tales - part 2 - Keyword FTS
Author m_theologos
--- In Firebird-Architect@yahoogroups.com, Jim Starkey <jas@...>
wrote:
>
> Roman Rokytskyy wrote:
> > Jim,
> >
> >
> >> Text search needs to be multi-table and multi-field to be useful.
> >>
> >
> > I guess, you did not change your approach to this topic
significantly,
> > right? Then your multi-table search can be used only via API and
> > result of such query is set of result sets, which cannot be
> > represented via SQL.
> >
> Yes, that's correct. The result of a search operation is a
ResultList
> that can be iterated similar to a ResultSet, but the "value" of a
> ResultList is a ResultSet rather than a scalar value.

And so, you'll take out the result from the server's engine. The
entire processing must be done on the client side building the
appropiate functions from scratch. I rather prefer a more SQL
approach, using JOINS, Views aso.

> > But MySQL as well as your Netfrastructure provide a keyword that
> > allowed to perform a query against single table (and they are not
the
> > only ones - Oracle and MS SQL do similar things). Do you want to
say
> > that MATCH/MATCHING operators and alike is not useful at all?
> >
> I have a customer who is quite happy using the MATCHING predicate
> against a single column. The implementation is a general search
> constrained to a single table and column. The semantics are
otherwise
> the same.
>
> The Falcon search code will be part of the initial Falcon alpha
code
> base even though it isn't accessible through MySQL. Like other
MySQL
> code, it will be released under the GPL, but the ideas will be
there for
> the taking. MySQL has an existing full text search capability that
I
> would prefer not to comment on publicly.
> > I'd say that suggested keyword search is a simplified version of
MATCH
> > operator that accesses a multi-field FTS index, however only for
one
> > table. Having that case implemented would be of great benefit for
> > Firebird, considering amount of applications that use FTS via SQL.
> > Don't you agree?
> >

I agree. And I think that is easy to implement it. Of course if you
want a more advanced approach the things change. Then we'll do a more
dedicated structure, IMHO. If you're interested, drop a line.

Also, please observe that, generally speaking, each column to be
indexed tend to has its own vocabulary. For example:

On a ERP:

The ACCOUNTS.DESCRIPTION will have a quite different vocabulary in
comparision with PRODUCTS.DESCRIPTION or MOVEMENT.REASON even if,
let's say all of these fields has the same data type (or domain).

On a CRM:

The CONTACTS.ADDRESS will be very different lexically compared with
CONTACTS.NOTES or, more, with CONTACTS.EMAIL or CONTACTS.WWW (the
last one has other separators, other stop words etc.)

So, the index will be much bigger than we need for a table only FTS
search. Also, builiding a muti table structure needs new procs to
deal with updating this, searching, retreiving aso. I only propose a
usefull feature (IMHO) which can be implemented relatively simple,
based on a (very) verified engine. If we want (let's say in version
4) to add proximity search, synonims, multi-table parallel keywords,
multi-language fuzzy search, so far so good.

I think that (IMHO) is better to implement step-by-step things (no
stupid crap inside the engine of course), rather than leaving
unimplemented a feature because we cannot make it perfect from the
beginning.

> Personally, I believe that web style applications are better
> applications that the traditional File/Edit/View framework, and web
> application begin with search. There are good uses for a heavily
> restrictive search, however. In my book, implementing a multi-
table,
> multi-field index then, if appropriate, filtering at the node
walking
> level to a single table and field makes a great deal of sense.
>

...but please take in consideration that behind the web page is an
app which deals with concrete kinds of data orgainzed in tables,
"kinds" from _human_ point of view (I don't mean here 'data types'),
so as you observed, 'There are good uses for a heavily restrictive
search...' In conclusion, I think that a multi-table multi-field FTS
index is good to have but having only this is a 'heavy' thing to deal
with IMHO. (No SQL, lack of speed, difficult to refine aso.)

hth, (my 2c)

m. th.