Subject Re: Converting A Blob To A String Within A SP
Author martinthrelly
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
>
> At 02:51 AM 9/11/2005 +0000, you wrote:
>
> >hello helen. let me describe what i want to do with a simplified
> >example. sorry if this is a bit long but hopefully you will get
the
> >gist of what i am trying to do.
> >
> >lets say i have a table called Clients with 3 columns as such:
> >
> >ID integer
> >Name varchar(32)
> >Notes Blob
> >
> >so i insert a row into the table like this
> >
> >ID: 1
> >Name: 'John Smith'
> >Notes: 'Keen chess player...etc'
> >
> >so now i have inserted this row i want to take each word and
> >populate a SEARCH_WORDS table in order to facilitate a full text
> >search.
> >
> >so from my insert trigger i call an SP which makes a long blob
> >consisting of all the words in my searchable fields. in this
> >case 'Name' and 'Notes. like this:
> >
> >declare variable KEYWORDS BLOB SUB_TYPE 1 SEGMENT SIZE 80;
> >keywords = :Name || ' ' || :Notes;
> >
> >so now i have a new blob containing all my keywords. now in order
to
> >populate my SEARCH_WORDS table i must seperate this keywords blob
> >into individual WORDS of type varchar(32). so i send this new blob
> >off to a second SP called SP_BLOB_SPLITTER.
> >
> >so i do somthing like this:
> >
> >for select WORDS from SP_BLOB_SPLITTER (:keywords) into word
> >do begin
> >insert into SEARCH_WORDS(:word); ...etc...
> >
> >in order to achieve this my SP_BLOB_SPLITTER must therefore
accept a
> >blob and somehow return a list of words within this blob
deliminated
> >by a ' '. i have achieved this with an input parameter of varchar
> >(32000) by iterating over each character. but how can i do this
with
> >a blob? how can i iterate over each character within a blob
without
> >first making a varchar. and how can i make a varchar when the blob
> >might be too big? this is my problem.
>
> Honestly, I don't think it's your problem at all. I think your
problem is
> that you are providing a horrendously complicated insfrastructure
of
> strings that you're hoping to parse effectively using just SQL.
My advice
> is: forget it. Programming languages do string parsing a lot more
> effectively. Relational databases do relationships very nicely.
Render
> unto Caesar and all that jazz.
>
> Use SUBSTRING in your SP to extract strings from the blob. Write
a parsing
> engine in C/C++ or Pascal that does what you want: accepts these
string
> "chunks", parses out single words and makes sense of them
as "keywords" and
> passes back candidate keywords to the SP. Export the engine as
one or more
> UDFs. Then let the database engine work out what to do with the
keyword by
> testing keywords it already has and associations that it has
already stored.
>
> And this idea of storing something like "Joe Bloggs: fishing" as a
keyword
> entry isn't very scientific.
>
> Store the documents in one table with whatever identifying data
you
> need. Store keywords in a primitive table with no affiliations to
> documents at all. Then link the documents to the keywords table
by an
> intersection table that has foreign keys to both the document and
the
> keywords tables.
>
> This is an over-simplification of what's a really complex thing to
do. For
> example, simply splitting out single words out of context doesn't
do much
> for meaning. Complex tasks need smart languages.
>
> ./heLen
>

thanks for the reply helen. the words "square peg" and "round hole"
come to mind. i have now come around full circle and i am thinking
about going one step further and porting the index and full text
search documents out to an external file directory using DotLucene.

i can then write a UDF which communicates with DotLucene. i can call
that with an sql like "SELECT id from MYTABLE where id in (select *
from dotlucene where hits = "joe" or hits = "bloggs").

something like that. this way i port my full text search module away
from the database. which is good i think because it will be better
performance and also it blatantly doesnt belong there in terms of
its structure.

the problem with having an external directory is concurrency but i
think i can perhaps manage this via a queue which is sent items via
my insert/update triggers.

thanks again for the reply