Subject Re: [firebird-support] rtf-to-plaintext udf?
Author Ivan Prenosil
* perhaps the problem can be somewhere else, before assigning PlainText property.
E.g. do you have correct definition of blob structure and get segment function ?

* the blob can be potentially segmented, so you need to call getsegment in a loop

*
> result := ib_util_malloc(length(rtf.text) + 1);
> ZeroMemory(result, length(rtf.text) + 1);
> result := resultString(PChar(str.dataString), str.Size + 1);

are not you overwriting the pointer allocated by ib_util_malloc ?

* TRichEdit is visual component, which is probably not much good
for use in UDF. Should not you also assign Parent property ?

* it is many years since I worked with rich edit component,
so I already forgot the problems I had with it, but perhaps
you can find some useful info here
http://home.att.net/~robertdunn/Yacs.html

* I would parse plain text fields in UDF, but convert more complex
formats to plain text on client. You may want to support more
kinds of data later, like Word documents, and calling any Word
functions in FB server using UDF is not a good idea ...

You may also want to support Linux server later, then again
you will have problems with UDFs based on Delphi/Win32 components.

Ivan
http://www.volny.cz/iprenosil/interbase/

----- Original Message -----
From: "Urs Liska" <firebird@...>
To: <firebird-support@yahoogroups.com>
Sent: Friday, January 28, 2005 9:32 PM
Subject: [firebird-support] rtf-to-plaintext udf?


>
> Hello,
>
> ich have several blob fields in my database that contain rtf text.
> To parse the contents and split it into words I am looking for a method
> to extract the plain text from the blobs.
> For this I think I have to write an udf (with Delphi).
> I asked about this in a Delphi forum and got the proposal to use the
> TRichEdit component (that is a Wrapper around Windows' RichEdit control).
> I am not happy about this because I think it is too much overhead to
> create an instance of such a component in an udf, because it can happen
> very often (for each record of course).
> But since it was the only idea I tried it with the result of the
> Firebird server crashing (not just occasionally but everytime the udf is
> executed).
>
> The code used to get the blob data into the richedit is:
>
> if (not Assigned(aBlob))
> or ( aBlob^.TotalSize = 0)
> then exit;
> len := aBlob^.TotalSize + 1;
> rtf := TRichEdit.Create(nil); //rtf is TRichEdit
> buffer := StrAlloc(len);
> str := TStringStream.Create('');
> try
> aBlob^.GetSegment(aBlob^.BlobHandle, buffer, len, bytesRead);
> rtf.PlainText := false;
> rtf.text := buffer;
>
> Then I want to switch to plaintext and write it to a stream and then to
> result:
>
> rtf.PlainText := true;
> rtf.Lines.SaveToStream(str);
> result := ib_util_malloc(length(rtf.text) + 1);
> ZeroMemory(result, length(rtf.text) + 1);
> result := resultString(PChar(str.dataString), str.Size + 1);
>
> finally
> StrDispose(buffer);
> rtf.Free;
> str.Free;
> end;
>
> It seems that the line "rtf.Plaintext := true" causes the server to crash.
>
> Can anybody tell me
> a)
> whether there is a problem with this approach
> b)
> whether the problem arises just from wrong handling (coding mistakes)
> c)
> whether there is a totally different approach to get the plain text out
> of a rtf blob.
>
> Thank you very much
> Urs