Patricia Fierro

PDF and TIFF files


By Patricia Fierro. Submitted on January 2, 2007

About the author: Patricia Fierro is a software engineer with 15 years of translation experience. Spanish native speaker from Ecuador who lived in the US for fourteen years. English > Spanish and Portuguese > Spanish translator.



We will describe an optional way of transforming Acrobat Reader files into files that can be read by Microsoft ® Office Word. This method can be used if you have either Microsoft ® Office XP or Microsoft ® Office 2003.

We realize that there are already several articles that describe different tools and software that can recognize PDF files or convert them by using OCR (Optical Character Recognition). We hope that you will find the following method useful when dealing with Acrobat files if other methods did not work.

PDF files cannot be converted by OCR methods, if they were created by scanning, which is a process that generally generates images. Some scanners automatically create PDF files. Scanning files is nowadays more common than sending documents by fax or by snail mail. Sometimes you cannot ask the client to send you a Microsoft Word document.

The following method can be used to convert .PDF text files to MS Word files. Unfortunately, it will not convert images or tables.

We have used Adobe ® Acrobat 6.0.

The method is:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file that you want to convert.
  3. Save the PDF file as a TIFF file by going to File > Save As. Click on the dropdown list (Save As Type) and choose TIFF. Acrobat will usually keep the same filename and add the .tiff extension. You can also change the folder or filename.
  4. Click on the Save button.
  5. Wait. The process takes about 10 minutes and saves each page of the original PDF file into a separate .tif or .tiff file.
  6. Close Adobe Acrobat Reader.
  7. Open Microsoft Office Document Imaging by going to Start > Programs > Microsoft Office > Microsoft Office Tools
  8. Open one of the .TIFF files that was generated
  9. Go to Tools > Recognize text using OCR
  10. After the text is recognized, you can go to Tools > Send Text to Word. The process takes about 5 or 10 minutes. Each time you do this, a new instance of MS Word will be opened. We have not found a way to avoid this.

We would suggest for you to just convert one page at a time and try to see if the Microsoft Office Document Imaging is able to recognize the text correctly.

Since OCR processing depends on the language to be recognized, you must be sure that under Microsoft Office Document Imaging > Tools > Options > OCR > OCR Language, you have selected the right one. This means that if you will try to recognize a document in English, under this option, you must have the correct setting.

There are several OCR software packages available, but some require a very specific file type. We have used TextBridge Classic 2.0 ®, but not for processing .TIFF files, since it does not recognize this type of files.

Some .PDF files allow you to select text to copy-and-paste into an MS Word document.

To do this, follow these steps:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file
  3. Go to Tools > Basic > Selection > Select Text
  4. Highlight the text that you want to copy to MS Word
  5. Ctrl + C to copy the text (Press Ctrl and C at the same time.)
  6. Open MS Word
  7. Ctrl + V to paste the text (Press Ctrl and V at the same time.) Or you can go to Edit > Paste.

Sometimes we need Word files especially if we will use a CAT tool, such as Trados, SDLX, or Wordfast. We also use this method when we need to distribute a large PDF file among several translators, proofreaders, or editors.

I hope that you have found this article useful. Please send me any comments by email.

Recommend this article: stumbleupon|digg|del.icio.us|reddit|facebook


Back

© ANVICA Software Development 2002—2009. All rights reserved.