Subject | Re: [Firebird-Architect] Re: Full Text Search |
---|---|
Author | Lester Caine |
Post date | 2005-02-07T17:13:51Z |
Roman Rokytskyy wrote:
searching for family history information. Ideally the whole lot needs to
be massaged into a consistent database with links to each piece of
source information relating to each 'person' record, but just searching
the data is a start.
I can see an advantage in random searching using a concatenation of a
whole record in order to assist that search. So the result gives the
record number and the field containing the match. This would probably
achieve Jim's aim as well, but is not something I see as essential.
What *I* am looking for is a means of managing the indexing properly, so
that external documents can be scanned and a document ID supplied, which
is returned as part of the query. Many of the points that have been
brought up are fleshing out my own ideas, but I think I am on a
different tack to Jim and probably other people. However I don't see any
difference between indexing internal BLOB and large VARCHAR fields and
external documents - just how we identify them. In any case I can see an
advantage in storing a plain text version of a document in the database
with a link to the original. THAT is something I am starting to do,
using the output of an OCR package to supply the data in many cases. So
creating the concatenated copy of a record could assist that process,
along with 'normalising' any date data located - but that comes later ;)
For every useful document there can be hundreds of side-tracks. They are
not needed today, but as the tree grows, re-scanning old documents
supplies missing links. In addition, I like to make my data available to
other people, so they will be doing the same searches for their own links.
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services
> Lester, let's assume we have this full-text feature. How do you wantI did at the start :)
> to use it? Please sketch some queries.
> The idea is that one should be able to search for 'CAINE' and match allI have an assortment of databases and documents forming the results of
> the 'CAIN', 'KANE', 'CANE', 'KINE' equivalents, and then search on
> 'BIRTH' or 'ISLE OF MAN' to further restrict the result set.
searching for family history information. Ideally the whole lot needs to
be massaged into a consistent database with links to each piece of
source information relating to each 'person' record, but just searching
the data is a start.
I can see an advantage in random searching using a concatenation of a
whole record in order to assist that search. So the result gives the
record number and the field containing the match. This would probably
achieve Jim's aim as well, but is not something I see as essential.
What *I* am looking for is a means of managing the indexing properly, so
that external documents can be scanned and a document ID supplied, which
is returned as part of the query. Many of the points that have been
brought up are fleshing out my own ideas, but I think I am on a
different tack to Jim and probably other people. However I don't see any
difference between indexing internal BLOB and large VARCHAR fields and
external documents - just how we identify them. In any case I can see an
advantage in storing a plain text version of a document in the database
with a link to the original. THAT is something I am starting to do,
using the output of an OCR package to supply the data in many cases. So
creating the concatenated copy of a record could assist that process,
along with 'normalising' any date data located - but that comes later ;)
For every useful document there can be hundreds of side-tracks. They are
not needed today, but as the tree grows, re-scanning old documents
supplies missing links. In addition, I like to make my data available to
other people, so they will be doing the same searches for their own links.
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services