Subject rtf-to-plaintext udf?
Author Urs Liska
Hello,

ich have several blob fields in my database that contain rtf text.
To parse the contents and split it into words I am looking for a method
to extract the plain text from the blobs.
For this I think I have to write an udf (with Delphi).
I asked about this in a Delphi forum and got the proposal to use the
TRichEdit component (that is a Wrapper around Windows' RichEdit control).
I am not happy about this because I think it is too much overhead to
create an instance of such a component in an udf, because it can happen
very often (for each record of course).
But since it was the only idea I tried it with the result of the
Firebird server crashing (not just occasionally but everytime the udf is
executed).

The code used to get the blob data into the richedit is:

if (not Assigned(aBlob))
or ( aBlob^.TotalSize = 0)
then exit;
len := aBlob^.TotalSize + 1;
rtf := TRichEdit.Create(nil); //rtf is TRichEdit
buffer := StrAlloc(len);
str := TStringStream.Create('');
try
aBlob^.GetSegment(aBlob^.BlobHandle, buffer, len, bytesRead);
rtf.PlainText := false;
rtf.text := buffer;

Then I want to switch to plaintext and write it to a stream and then to
result:

rtf.PlainText := true;
rtf.Lines.SaveToStream(str);
result := ib_util_malloc(length(rtf.text) + 1);
ZeroMemory(result, length(rtf.text) + 1);
result := resultString(PChar(str.dataString), str.Size + 1);

finally
StrDispose(buffer);
rtf.Free;
str.Free;
end;

It seems that the line "rtf.Plaintext := true" causes the server to crash.

Can anybody tell me
a)
whether there is a problem with this approach
b)
whether the problem arises just from wrong handling (coding mistakes)
c)
whether there is a totally different approach to get the plain text out
of a rtf blob.

Thank you very much
Urs