PDF Conversion Programs – Which is best for Translators?

January 17th, 2008 | by Naomi de Moraes

The principal products seem to be ABBYY FineReader and Nuance OmniPage. These two companies also have less expensive tools with fewer features (ABBYY PDF Transformer and Nuance PDF Converter). A recent LTD list discussion (see the LTD Mailing List for information on joining if you are an ATA member) introduced by member Betty Welker discussed which software is best for converting pdf files into MS Word documents.

I have PDF Converter (by Nuance), but I probably chose it based on price alone. At the time, I was more interested in marking up (editing) pdf files and converting Word files into pdf files. Most list contributors felt that…

…ABBYY FineReader was better. It has a spell check feature, does not create as many text boxes, and recognizes various languages well. One contributor stated that ABBYY FineReader recognized German better than OmniPage. I have found that PDF Converter recognizes Portuguese well, but I have not tested the others. One contributor asked if anyone has had any experience with ABBY FineReader for Lithuanian and Russian (it seems OmniPage recognizes these languages well). If you do, please comment below!

It seems ABBYY FineReader has 4 options for saving to MS Word, with decreasing amounts of retained formatting, while PDF Converter (which is the cheaper Nuance offering) has only one output format.

All programs have difficulty when faced with tables or handwriting. I wish there was some way for me to tell PDF Converter to just ignore images and handwriting and simply give me the text, maybe even without formatting. Unfortunately, translators are probably the only users who would be interested in this kind of output!

Does anyone know of any decent free (perhaps online) pdf–>text converters? I once downloaded a free text–>pdf converter and it did not work very well.

One user complained about ABBYY’s customer support, described as “pathetic”.

ABBYY FineReader costs about $250, but ABBYY PDF Transformer (which creates and converts PDF files) is only about $100. No one mentioned PDF Transformer, so I am not sure which capabilities are common to the less expensive and more expensive ABBYY tools. PDF Converter 4 costs about $50, while OmniPage costs about $150.

I am curious if any of these programs can deal with tif files. I actually receive far more tif files than pdf files.

If you have anything to add, or can answer any of the questions above, please comment.

Naomi de Moraes

  1. Ted Wozniak Says:

    Finereader 9.0 recognizes most common image files, gif, jpeg, tiff, etc. After reading the file, you can finetune the recognition (e.g. change a text block to a table, merge cells in a table) and remove recognition areas (e.g. signatures).

    Since I don’t have a German spellchecker in Office, I also use ABBY to spellcheck the text before exporting to the desired output file.

  2. Michael W. Says:

  3. Jill Sommer Says:

    ABBYY was developed by the makers of ABBYY Lingvo 1.0, an electronic English-Russian-English dictionary and its headquarters are in Moscow (http://www.abbyy.com/company/). It definitely recognizes Russian – as well as 178 other languages (http://finereader.abbyy.com/?param=137522).

  4. H. Wang Says:

    I dealed a lot of PDF files with tables. I tested both OmniP age and Fine Reader and I found out FineReader actually did a better job… I recently upgraded to Fine Reader 9 and am very happy about it. It recognizes tables very well. I have a lot of multiple-page spreadsheet-like tables, amazing thing is that FineReader is able to combine all content into ONE worksheet in Excel. That saves me a lot of time to copy and paste information which Omni Page puts each page in one separated worksheet. BTW, you only have to pay $179 when upgrading from other OCR software to Fine Reader. With the accuracy it provides, it is not a bad deal at all.

  5. Steven Marzuola Says:

    I have been using FineReader since 2001, and upgraded to version 8.0 last fall. It’s great software, but in my experience, it’s almost never a good idea to let FR attempt to recognize all the text by itself. Instead, I spend a few seconds per page drawing text boxes, and where necessary, defining the images and tables. That way it eliminates almost all text boxes. I haven’t used OmniPage but I never felt I was missing anything.

