firebird-architect - Re: Full Text Search

Subject	Re: Full Text Search
Author	Roman Rokytskyy
Post date	2005-02-07T15:11:44Z

> Jakarta Lucene is one of the packages I have looked at, but for me
> it has a major drawback - "written entirely in Java" :)

There is a C++ port, but I do not know the quality of it.

> WHAT it is doing is no different to what we want to do. The problem
> is HOW it does it - by building a large number of files with all the
> various indexes in, and I could not work out just how many files it
> needed for a simple index ;)

Each file including its format is described on Lucene website. In
general, there are frequency files, word files, and actually index
segments (i.e. nodes).

> The only difference between Lucene and an internal full text search
> is the management of the index and data.

That's not true. The biggest difference is that Lucene knows what it
returns as an answer to a query - document IDs. And a document is a
collection of fields. We still do not know what we want to return. Jim
claims that we should return a collection of all possible records from
different tables (e.g. one records from CUSTOMER table, two from
ADDRESS table, and so on). I'm not comfortable with this, but so far
nobody suggested any other solution.

Lester, let's assume we have this full-text feature. How do you want
to use it? Please sketch some queries.

Roman