Home Home Home
Home

Nov. 21, 2008    18:25 GMT


4164 translation agencies
81846 registered members
.
Search
Members Log On

User Name

Password

Remember me Click to get help

Send me my password
Free registration
.
Stand out from the crowd!
.
Clients Looking for a Language Professional:

Post a Job

Find a Translator

Find a Translation Agency

.
Language Professionals:

Translation Jobs

Terminology Help

Discussion Forums

Glossaries

.

THE HOW-TO LIBRARY


Patricia Fierro

PDF and TIFF files


By Patricia Fierro. Submitted on January 2, 2007

About the author: Patricia Fierro is a software engineer with 15 years of translation experience. Spanish native speaker from Ecuador who lived in the US for fourteen years. English > Spanish and Portuguese > Spanish translator.



We will describe an optional way of transforming Acrobat Reader files into files that can be read by Microsoft ® Office Word. This method can be used if you have either Microsoft ® Office XP or Microsoft ® Office 2003.

We realize that there are already several articles that describe different tools and software that can recognize PDF files or convert them by using OCR (Optical Character Recognition). We hope that you will find the following method useful when dealing with Acrobat files if other methods did not work.

PDF files cannot be converted by OCR methods, if they were created by scanning, which is a process that generally generates images. Some scanners automatically create PDF files. Scanning files is nowadays more common than sending documents by fax or by snail mail. Sometimes you cannot ask the client to send you a Microsoft Word document.

The following method can be used to convert .PDF text files to MS Word files. Unfortunately, it will not convert images or tables.

We have used Adobe ® Acrobat 6.0.

The method is:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file that you want to convert.
  3. Save the PDF file as a TIFF file by going to File > Save As. Click on the dropdown list (Save As Type) and choose TIFF. Acrobat will usually keep the same filename and add the .tiff extension. You can also change the folder or filename.
  4. Click on the Save button.
  5. Wait. The process takes about 10 minutes and saves each page of the original PDF file into a separate .tif or .tiff file.
  6. Close Adobe Acrobat Reader.
  7. Open Microsoft Office Document Imaging by going to Start > Programs > Microsoft Office > Microsoft Office Tools
  8. Open one of the .TIFF files that was generated
  9. Go to Tools > Recognize text using OCR
  10. After the text is recognized, you can go to Tools > Send Text to Word. The process takes about 5 or 10 minutes. Each time you do this, a new instance of MS Word will be opened. We have not found a way to avoid this.

We would suggest for you to just convert one page at a time and try to see if the Microsoft Office Document Imaging is able to recognize the text correctly.

Since OCR processing depends on the language to be recognized, you must be sure that under Microsoft Office Document Imaging > Tools > Options > OCR > OCR Language, you have selected the right one. This means that if you will try to recognize a document in English, under this option, you must have the correct setting.

There are several OCR software packages available, but some require a very specific file type. We have used TextBridge Classic 2.0 ®, but not for processing .TIFF files, since it does not recognize this type of files.

Some .PDF files allow you to select text to copy-and-paste into an MS Word document.

To do this, follow these steps:

  1. Open Adobe Acrobat Reader.
  2. Open the PDF file
  3. Go to Tools > Basic > Selection > Select Text
  4. Highlight the text that you want to copy to MS Word
  5. Ctrl + C to copy the text (Press Ctrl and C at the same time.)
  6. Open MS Word
  7. Ctrl + V to paste the text (Press Ctrl and V at the same time.) Or you can go to Edit > Paste.

Sometimes we need Word files especially if we will use a CAT tool, such as Trados, SDLX, or Wordfast. We also use this method when we need to distribute a large PDF file among several translators, proofreaders, or editors.

I hope that you have found this article useful. Please send me any comments by email.

Recommend this article: stumbleupon|digg|del.icio.us|reddit|facebook


    

AGENCY OF THE HOUR

LingoNova Consulting Services

.
LingoNova Consulting Services
LingoNova is a professional translation and localization service provider in China. We have powerful teams composed of veteran project managers and experienced translators with solid professional background. We offer localization (L10n) services to organizations and individuals all around the world. LingoNova has a group of highly skilled and educated translators mainly specialized in translations between English and Chinese and other major Asian languages. We are accustomed to working with companies of different industries and meeting client’s demands. Our translators specialize in various domains such as IT, computer, electronics, telecommunications, textile, education, literature, medical, financial, legal documents and so on. Please consult with LingoNova if you are facing the challenges of expanding your global market and/­or are requiring professional project management to translate and localize for your business. We have all the means to provide the expertise you need.­
.
Wallpapers from TC
Ontario Countryside
Lake Ontario
Toronto Skyline
Rattray Marsh

All images 1920x1440
.
LINGUIST OF THE HOUR

TC Master Pilar Royo
TC Master
Pilar Royo


.
TranslatorsCafé.com © ANVICA Software Development 2002—2008. All rights reserved.
Privacy Policy. Terms and Conditions of Use. Use signifies your agreement.
Mail comments and suggestions to TranslatorsCafe.com webmaster
Directory of translators, interpreters and translation agencies