Subject Re: Index structures
Author Jim Starkey
At 06:28 PM 6/10/03 +0000, Roman Rokytskyy wrote:
>Jim,
>
> > I'd rather not talk about it. Sorry.
>
>Excuse me if I'm too naughty... I spent some time on full text search
>in our application, but I'm still not sure if my solution is not too
>complex. I have intergrated Lucene by introducing artificial layer
>between application and database layer, but I'm not sure if this is
>the best solution since approach of inverted file used by Lucene is
>alien to the relational databases.


If you will excuse me from talking about internals I'd be more than happy
to talk about architecture.

If you will pardon my bluntness (OK, I'm not the BBW for nothing), the idea
of full text search as a SQL extension is close to braindead. I say "close"
because sometimes you do want to search specific columns of specific
tables. But that is so far from the general case (and a very easy extension
once you've solve the general problem) as to almost not worth discussing.

The general case is "here's a database, I want to search it", just as when
you go to Google you say "here's a World Wide Web, I want to search it."
And there are certainly intermediate cases like "my application shares a
database with other applications, I just want to search my application."

What I did in Netfrastructure was to extend the JDBC Statement class
with the following:

public native NfsResultList search (String query) throws SQLException;
public native void setTableFilter (String tableName) throws
SQLException;

where the class NfsResultList looks like:

public class NfsResultList
{
public void finalize () throws SQLException
{
close();
}

public native int getCount();
public native void close();
public native boolean next();
public native String getTableName();
public native NfsResultSet fetchRecord();
public native float getScore();
}

As you have probably already figured out, the search method applies a search
string (I use Alta-Vista semantics to advertise my age) and give back a
ResultList
object. By default, it searches all fields marked as "searchable" in the
database.
If that's overkill (usually is), I can set which tables I wished searched,
in which case
the search is restricted to the fields in those tables marked as "searchable."

The method "next" iterates through a ResultList (which is automatically ordered
by an unpublished magic metric). The "getTableName" method returns the
table name, but in Netfrastructure this is rarely, if ever used. The meat
of the
class is the method "fetchRecord" which returns a result set containing the
search hit. I have architectural provision to join partial hits related by
primary/
foreign key, but it hasn't proven necessary, and I don't anticipate ever
implementing
it.

The very strange thing about multi-table/multi-field search is that although it
is the essence of the Internet, it is almost unknown the database world.
But, hey, until Interbase popularized blobs, they weren't recognized either.




Jim Starkey