Subject | Re: [firebird-support] rtf-to-plaintext udf? |
---|---|
Author | Urs Liska |
Post date | 2005-01-31T12:54:07Z |
Hello Alan,
thanks for your various replies.
I will have a closer look at trichview.com. On first glance it looks
very nice. But I'll have to find out if I will need all the features I
have to pay for ;-).
What I really have to think about is your comment about the client side
and time used in an UDF. It would be easier to do transformation in an
application, but the database relies on it.
What I want to do is:
I have several blob fields containing rtf text.
For a kind of full text search engine I have a table with words that are
extracted from several fields on INSERT and UPDATE.
For the VARCHAR fields and the plain text blobs this is already solved
(although probably not very efficently).
What I don't know yet is how to extract the words from rtf fields, i.e.
how to transform the rtf to plain text.
As the extraction UDFs will get wrong results when run against rtf text
the database depends on getting plain text.
The only way I could think of when not realizing the extraction in an
UDF is to have an additional plain text field and "force" any
application to do the transformation by defining this field as NOT NULL.
Given this scenario, do you still think the UDF would be the wrong place
to do the transformation?
Urs
Alan McDonald schrieb:
thanks for your various replies.
I will have a closer look at trichview.com. On first glance it looks
very nice. But I'll have to find out if I will need all the features I
have to pay for ;-).
What I really have to think about is your comment about the client side
and time used in an UDF. It would be easier to do transformation in an
application, but the database relies on it.
What I want to do is:
I have several blob fields containing rtf text.
For a kind of full text search engine I have a table with words that are
extracted from several fields on INSERT and UPDATE.
For the VARCHAR fields and the plain text blobs this is already solved
(although probably not very efficently).
What I don't know yet is how to extract the words from rtf fields, i.e.
how to transform the rtf to plain text.
As the extraction UDFs will get wrong results when run against rtf text
the database depends on getting plain text.
The only way I could think of when not realizing the extraction in an
UDF is to have an additional plain text field and "force" any
application to do the transformation by defining this field as NOT NULL.
Given this scenario, do you still think the UDF would be the wrong place
to do the transformation?
Urs
Alan McDonald schrieb:
>>Hello,
>>
>>I have several blob fields in my database that contain rtf text.
>>To parse the contents and split it into words I am looking for a method
>>to extract the plain text from the blobs.
>>For this I think I have to write an udf (with Delphi).
>>I asked about this in a Delphi forum and got the proposal to use the
>>TRichEdit component (that is a Wrapper around Windows' RichEdit control).
>>I am not happy about this because I think it is too much overhead to
>>create an instance of such a component in an udf, because it can happen
>>very often (for each record of course).
>>But since it was the only idea I tried it with the result of the
>>Firebird server crashing (not just occasionally but everytime the udf is
>>executed).
>>
>>The code used to get the blob data into the richedit is:
>>
>> if (not Assigned(aBlob))
>> or ( aBlob^.TotalSize = 0)
>> then exit;
>> len := aBlob^.TotalSize + 1;
>> rtf := TRichEdit.Create(nil); //rtf is TRichEdit
>> buffer := StrAlloc(len);
>> str := TStringStream.Create('');
>> try
>> aBlob^.GetSegment(aBlob^.BlobHandle, buffer, len, bytesRead);
>> rtf.PlainText := false;
>> rtf.text := buffer;
>>
>>Then I want to switch to plaintext and write it to a stream and then to
>>result:
>>
>> rtf.PlainText := true;
>> rtf.Lines.SaveToStream(str);
>> result := ib_util_malloc(length(rtf.text) + 1);
>> ZeroMemory(result, length(rtf.text) + 1);
>> result := resultString(PChar(str.dataString), str.Size + 1);
>>
>> finally
>> StrDispose(buffer);
>> rtf.Free;
>> str.Free;
>> end;
>>
>>It seems that the line "rtf.Plaintext := true" causes the server to crash.
>>
>>Can anybody tell me
>>a)
>>whether there is a problem with this approach
>>b)
>>whether the problem arises just from wrong handling (coding mistakes)
>>c)
>>whether there is a totally different approach to get the plain text out
>>of a rtf blob.
>>
>>Thank you very much
>>Urs
>
>
> I do something similar but I do at the client side. I wouldn't do this at
> the back end at all, sounds like a time-waster ( I mean a task which is too
> long for a UDF to perform acceptably ) and also a problem with catching and
> dealing with exceptions which are better placed at the client side.
> I have a richtext binary blob field and a plain text blob field. At the
> client, onPost event, I convert the text and assign it to the plain text
> field without the user knowing. This helps me with report writers which
> don't handle the native RVF format of TRichView.
> Alan
>