Subject PDFS [was: Full Text Search]
Author Jim Starkey
paulruizendaal wrote:

>
>
>>You're unlikely to find a PDF filter, however. I've surveyed most
>>of the open source world looking for a decent one. There are a
>>few, but they don't work very well.
>>
>>
>
>Well, I have found this stuff to work well enough for my puposes:
>http://www.foolabs.com/xpdf/download.html
>http://www.glyphandcog.com/Xpdf.html
>(look at e.g. the pdftotext tool) Perhaps you excluded GPL'ed stuff
>from your search?
>
>
No, I was just looking for ways that other people attacked problems --
technology, you know.

We're way off the topic, but this is the problem. The PDF spec is a
good spec; a little terse, but tight. Spec aside, Acrobat, the
definitive viewer, view accept almost anything. As a result, a high
proportion of non-Acrobat produced PDF documents are invalid but
hammered to get through Acrobat looking right. Adobe doesn't publish
what they do with invalid documents, and why should they? But that
leaves everybody with non-Adrobe PDF software trying to cope
intelligently with an infinite variety of bad PDF documents. Making a
spec-conforming PDF eater is not that bad -- PDFs have a certain obese,
twisted elegance about them. The making a PDF eater that is a
work-a-like to Acrobat drives a gentle manner engineer to take his
frustrations on well meaning open source project members (let's leave
names out of this). One of my customers submitted a PDF from one of his
clients that started with a capital "T" scaled to stretch from New
England to Chicago with the lower serif in the Gulf of Mexico
(coordinates in PDF are floating point). Acrobat said to itself, that's
silly, I'll just use a stock size. My converter went
"arrrrrrrgggggggggggghhhhhhhhh!!!!!!!".

Thanks for the references. A customer is threatening to pay me to beef
up my PDF converter. I need all the help I can get.


[Non-text portions of this message have been removed]