Home Home Home
Home

Nov. 24, 2009    13:13 GMT

579 users online (187 registered)
4768 translation agencies
102083 registered users
.
Search
Members Log On

User Name

Password
Click to get help

THE HOW-TO LIBRARY


Patricia Fierro

PDF and TIFF files


By Patricia Fierro. Submitted on January 2, 2007

About the author: Patricia Fierro is a software engineer with 15 years of translation experience. Spanish native speaker from Ecuador who lived in the US for fourteen years. English > Spanish and Portuguese > Spanish translator.



We will describe an optional way of transforming Acrobat Reader files into files that can be read by Microsoft ® Office Word. This method can be used if you have either Microsoft ® Office XP or Microsoft ® Office 2003.

We realize that there are already several articles that describe different tools and software that can recognize PDF files or convert them by using OCR (Optical Character Recognition). We hope that you will find the following method useful when dealing with Acrobat files if other methods did not work.

PDF files cannot be converted by OCR methods, if they were created by scanning, which is a process that generally generates images. Some scanners automatically create PDF files. Scanning files is nowadays more common than sending documents by fax or by snail mail. Sometimes you cannot ask the client to send you a Microsoft Word document.

The following method can be used to convert .PDF text files to MS Word files. Unfortunately, it will not convert images or tables.

We have used Adobe ® Acrobat 6.0.

The method is:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file that you want to convert.
  3. Save the PDF file as a TIFF file by going to File > Save As. Click on the dropdown list (Save As Type) and choose TIFF. Acrobat will usually keep the same filename and add the .tiff extension. You can also change the folder or filename.
  4. Click on the Save button.
  5. Wait. The process takes about 10 minutes and saves each page of the original PDF file into a separate .tif or .tiff file.
  6. Close Adobe Acrobat Reader.
  7. Open Microsoft Office Document Imaging by going to Start > Programs > Microsoft Office > Microsoft Office Tools
  8. Open one of the .TIFF files that was generated
  9. Go to Tools > Recognize text using OCR
  10. After the text is recognized, you can go to Tools > Send Text to Word. The process takes about 5 or 10 minutes. Each time you do this, a new instance of MS Word will be opened. We have not found a way to avoid this.

We would suggest for you to just convert one page at a time and try to see if the Microsoft Office Document Imaging is able to recognize the text correctly.

Since OCR processing depends on the language to be recognized, you must be sure that under Microsoft Office Document Imaging > Tools > Options > OCR > OCR Language, you have selected the right one. This means that if you will try to recognize a document in English, under this option, you must have the correct setting.

There are several OCR software packages available, but some require a very specific file type. We have used TextBridge Classic 2.0 ®, but not for processing .TIFF files, since it does not recognize this type of files.

Some .PDF files allow you to select text to copy-and-paste into an MS Word document.

To do this, follow these steps:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file
  3. Go to Tools > Basic > Selection > Select Text
  4. Highlight the text that you want to copy to MS Word
  5. Ctrl + C to copy the text (Press Ctrl and C at the same time.)
  6. Open MS Word
  7. Ctrl + V to paste the text (Press Ctrl and V at the same time.) Or you can go to Edit > Paste.

Sometimes we need Word files especially if we will use a CAT tool, such as Trados, SDLX, or Wordfast. We also use this method when we need to distribute a large PDF file among several translators, proofreaders, or editors.

I hope that you have found this article useful. Please send me any comments by email.

Recommend this article: stumbleupon|digg|del.icio.us|reddit|facebook


    
LINGUIST OF THE HOUR

TC Master Stefano Pancaldi
TC Master
Stefano Pancaldi


traduzione russo italiano: specializzati in traduzioni tecniche e traduzione legale
.
Last Forum Postings:

RE: B2b site Kompass.­com: Does anybody have experience? by Eva Stoppa 2 min. ago in Viewpoints, Questions, Comments, Advice forum.

All forums XML RSS Feed
.
Come Together


.
Featured How-To Articles: 
.

AGENCY OF THE HOUR

ET TMM - M. Hristov

.
ET TMM - M. Hristov
The Company was founded in 1996 with its core business - translation services and language training. In 2000 the company concluded a contract with the Consular Department of the Ministry of Foreign Affairs, which empowers it to make certified translations of documents issued in Bulgaria and abroad. We work successfully in the following main fields: Translation Interpretation Document legalization Certification of documents in embassies and diplomatic representations of different countries Language courses Certified translations of documents for foreign juridical persons Preparation and translation of documents for application to study at universities abroad, and consultations Preparation and translation of documents for establishing companies in Bulgaria as well as in deals with real properties Editing and proofreading
.
TranslatorsCafé.com Copyright © ANVICA Software Development 2002—2009. All rights reserved.
Privacy Policy. Terms and Conditions of Use. Use signifies your agreement.
Mail comments and suggestions to TranslatorsCafe.com webmaster
Directory of translators, interpreters and translation agencies.