Subject | Re: [Firebird-Architect] Re: Full Text Search |
---|---|
Author | Lester Caine |
Post date | 2005-02-06T13:36:01Z |
Jim Starkey wrote:
handle RTF in a similar way. Just need the right filter.
BLOB filters or do we need something else. The first step is obviously
taking the data and identifying indexable 'character entities' - coming
up with at least an outline of where we are heading will mean I can
start switching the hard coded stuff in line with it ;)
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services
>>> 9. Search indexing should be html-awareI tend to convert the pdf's to provide a 'plain text' version, and I do
>>
>>Not only HTML-aware, but also XML, RTF, MS Word, etc. But that is easy
>>to achieve. If that content is stored in the BLOB, we have already a
>>concept of BLOB filter. Just define a "searchable" BLOB type and
>>corresponding "HTML", "PDF", "RTF" BLOB types. The conversion between
>>that datatypes is done by filter.
>
> Oh, my head swims. Netfrastructure has filters for <*ml>, MSWord, and
> PDF. MSWord is a task on the scope of the Vulcan project. PDF has
> better documention but more intractable problems (hint: you have to
> emulate everything in a laser printer but the paper path and toner
> drum). Nobody's asked for RTF, thank god, but I have a converter on the
> shelf somewhere when they do.
handle RTF in a similar way. Just need the right filter.
> Interesting that you missed the Open Office formats? Hey, Roman, getThey are just filters - and I'll ask again - could they be handled as
> with the program. And what about WordPerfect? And the worst of them
> all, PowerPoint?
BLOB filters or do we need something else. The first step is obviously
taking the data and identifying indexable 'character entities' - coming
up with at least an outline of where we are heading will mean I can
start switching the hard coded stuff in line with it ;)
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services