We will describe an optional way of transforming Acrobat Reader files into files
that can be read by Microsoft ® Office Word. This method can be used
if you have either Microsoft ® Office XP or Microsoft ® Office 2003.
We realize that there are already several articles that describe different tools
and software that can recognize PDF files or convert them by using OCR (Optical
Character Recognition). We hope that you will find the following method useful
when dealing with Acrobat files if other methods did not work.
PDF files cannot be converted by OCR methods, if they were created by scanning,
which is a process that generally generates images. Some scanners
automatically create PDF files. Scanning files is nowadays more common
than sending documents by fax or by snail mail. Sometimes you cannot ask
the client to send you a Microsoft Word document.
The following method can be used to convert .PDF text files to MS Word
files. Unfortunately, it will not convert images or tables.
We have used Adobe ® Acrobat 6.0.
The method is:
-
Open Adobe Acrobat Reader.
-
Open the PDF file that you want to convert.
-
Save the PDF file as a TIFF file by going to File > Save As. Click on the
dropdown list (Save As Type) and choose TIFF. Acrobat will usually keep
the same filename and add the .tiff extension. You can also change the
folder or filename.
-
Click on the Save button.
-
Wait. The process takes about 10 minutes and saves each page of the
original PDF file into a separate .tif or .tiff file.
-
Close Adobe Acrobat Reader.
-
Open Microsoft Office Document Imaging by going to Start > Programs >
Microsoft Office > Microsoft Office Tools
-
Open one of the .TIFF files that was generated
-
Go to Tools > Recognize text using OCR
-
After the text is recognized, you can go to Tools > Send Text to Word.
The process takes about 5 or 10 minutes. Each time you do this, a new instance
of MS Word will be opened. We have not found a way to avoid this.
We would suggest for you to just convert one page at a time and try to see if
the Microsoft Office Document Imaging is able to recognize the text correctly.
Since OCR processing depends on the language to be recognized, you must be sure
that under Microsoft Office Document Imaging > Tools > Options > OCR
> OCR Language, you have selected the right one. This means that if
you will try to recognize a document in English, under this option, you must
have the correct setting.
There are several OCR software packages available, but some require a very
specific file type. We have used TextBridge Classic 2.0 ®, but not
for processing .TIFF files, since it does not recognize this type of files.
Some .PDF files allow you to select text to copy-and-paste into an MS Word
document.
To do this, follow these steps:
-
Open Adobe Acrobat Reader.
-
Open the PDF file
-
Go to Tools > Basic > Selection > Select Text
-
Highlight the text that you want to copy to MS Word
-
Ctrl + C to copy the text (Press Ctrl and C at the same time.)
-
Open MS Word
-
Ctrl + V to paste the text (Press Ctrl and V at the same time.) Or you can go
to Edit > Paste.
Sometimes we need Word files especially if we will use a CAT tool, such as
Trados, SDLX, or Wordfast. We also use this method when we need to
distribute a large PDF file among several translators, proofreaders, or
editors.
I hope that you have found this article useful. Please send me any
comments by email.
Recommend this article:  |  |  |  | 
|