Subject Re: [firebird-support] Re: Converting A Blob To A String Within A SP
Author Helen Borrie
At 02:51 AM 9/11/2005 +0000, you wrote:

>hello helen. let me describe what i want to do with a simplified
>example. sorry if this is a bit long but hopefully you will get the
>gist of what i am trying to do.
>
>lets say i have a table called Clients with 3 columns as such:
>
>ID integer
>Name varchar(32)
>Notes Blob
>
>so i insert a row into the table like this
>
>ID: 1
>Name: 'John Smith'
>Notes: 'Keen chess player...etc'
>
>so now i have inserted this row i want to take each word and
>populate a SEARCH_WORDS table in order to facilitate a full text
>search.
>
>so from my insert trigger i call an SP which makes a long blob
>consisting of all the words in my searchable fields. in this
>case 'Name' and 'Notes. like this:
>
>declare variable KEYWORDS BLOB SUB_TYPE 1 SEGMENT SIZE 80;
>keywords = :Name || ' ' || :Notes;
>
>so now i have a new blob containing all my keywords. now in order to
>populate my SEARCH_WORDS table i must seperate this keywords blob
>into individual WORDS of type varchar(32). so i send this new blob
>off to a second SP called SP_BLOB_SPLITTER.
>
>so i do somthing like this:
>
>for select WORDS from SP_BLOB_SPLITTER (:keywords) into word
>do begin
>insert into SEARCH_WORDS(:word); ...etc...
>
>in order to achieve this my SP_BLOB_SPLITTER must therefore accept a
>blob and somehow return a list of words within this blob deliminated
>by a ' '. i have achieved this with an input parameter of varchar
>(32000) by iterating over each character. but how can i do this with
>a blob? how can i iterate over each character within a blob without
>first making a varchar. and how can i make a varchar when the blob
>might be too big? this is my problem.

Honestly, I don't think it's your problem at all. I think your problem is
that you are providing a horrendously complicated insfrastructure of
strings that you're hoping to parse effectively using just SQL. My advice
is: forget it. Programming languages do string parsing a lot more
effectively. Relational databases do relationships very nicely. Render
unto Caesar and all that jazz.

Use SUBSTRING in your SP to extract strings from the blob. Write a parsing
engine in C/C++ or Pascal that does what you want: accepts these string
"chunks", parses out single words and makes sense of them as "keywords" and
passes back candidate keywords to the SP. Export the engine as one or more
UDFs. Then let the database engine work out what to do with the keyword by
testing keywords it already has and associations that it has already stored.

And this idea of storing something like "Joe Bloggs: fishing" as a keyword
entry isn't very scientific.

Store the documents in one table with whatever identifying data you
need. Store keywords in a primitive table with no affiliations to
documents at all. Then link the documents to the keywords table by an
intersection table that has foreign keys to both the document and the
keywords tables.

This is an over-simplification of what's a really complex thing to do. For
example, simply splitting out single words out of context doesn't do much
for meaning. Complex tasks need smart languages.

./heLen