About the author: Jost Zetzsche is an ATA-certified English-to-German translator and a localization and translation consultant. He co-founded International Writers' Group on the Oregon coast and sends out a free, biweekly technical newsletter for translators (see www.internationalwriters.com/toolkit).
For new (and, sadly, experienced) users of translation memory programs,
the use of the terminology databases often seems superfluous if not
downright confusing.
This has several reasons:
— Obviously, the name "translation memory" program seems to suggest
that the emphasis is on the translation memory (most of the major
applications have recognized this confusion and no longer actually use
this terminology).
— There is a more immediate gain through perfect and fuzzy matches on
a sentence-by-sentence basis than there is with terminology databases.
— Translation memories can be relatively quickly built up by aligning
existing translated file pairs and/or automatically as you translate
new texts.
— The construction of terminology databases is a comparatively
tedious process: terms have to be individually highlighted in the
translation or even entered into the terminology management
application, and additional information has to be entered.
If it is indeed so tedious to build up and use terminology databases,
what makes them so important?
As every experienced translator knows, translation is much more than
the mere exchange of translated segments across situations and
contexts. Though most translators have one or several fields of
specialization within their language combination(s), very few, if any,
work exclusively in a field in which language is so controlled that
there is no need for additional information on individual terms and
phrases except their one-to-one translations. We all know that semantic
fields of words and phrases across languages do not match each other
100% (where one word or phrase in Language A would always match one
word or phrase in Language B; simultaneously, that word or phrase in
Language B will always match only that word and no other in Language
A). Obviously, if things were that simple, there would be no need for
translators in the first place—machine translation would have long
taken over our translation profession! The terminology database is the
place where you can invest effort into defining your words and phrases
grammatically, contextually, or even by contrast.
Of course, none of this is news to anyone: any good dictionary offers
the same concept. What makes these "dictionaries" (if you will) much
more exciting is that you can build them up the way you want them.
Furthermore, they are "living dictionaries" that present their findings
for each of the segments you are currently translating without you
having to do anything (if you have previously given them the data that
they now share with you).
Why then is it helpful to have numerous different translations
for—let's say— "cat" ("feline animal," "computer assisted
translation," "Caterpillar," etc.) come up when I translate a
text? Because of the close association of the terminology
databases with your translation project, and because of all the
information that you or someone else has fed into the terminology
database as you entered the terms, the application will actually
recognize which of these terms is more relevant than another.
Depending, for instance, on whether you are translating a text of the
subject area "Flora and Fauna," "Translation Technology," or "Heavy
Machinery" (to stay with the silly examples above), the application
will make the more likely choice for you (while still allowing you to
access the other ones). You can then enter the displayed terms with the
help of keyboard shortcuts or, as in the case of Déjà Vu,
even automatically "assemble" target segments with the relevant terms.
Every translation memory tool with a terminology database component
offers the capability to enter data in two different ways: as you
translate and by importing external files.
The first method is clearly the most neglected one. Unlike the
automatic transfer of every finished segment to the translation memory,
sending data to the terminology database is a much more manual process.
But—also unlike the translation memory—chances are that you will have
a lot more matches for the newly entered term.
In addition to the term pair that you send to the terminology database,
you're also sending a variety of other data without actually doing
anything about it, including the already-defined data from your project
(such as file name, subject, and client) as well as user and date and
time information. You're also free to enter any other data that you
deem necessary, such as context, definitions, synonyms or antonyms, or
grammatical information. Someone at the conference last weekend asked
me how much and what kind of information should be entered. Even though
this sounds kind of flippant, I think that the correct answer is: As
much as you deem necessary to adequately describe the terms while
wisely using your time. It doesn't make sense to enter information
about the gender of a term that you as a native language translator
should know simply because there's the possibility to enter it;
however, depending on the entry, it may make a lot of sense to enter
some contextual or stylistic information, or in some cases grammatical
information (and that may even include information about the gender).
The second method of reading data into a terminology database is by
importing an external file. Depending on the tool, a variety of formats
are supported (such as Excel, text files, database formats, or even
other CAT tool formats), and usually the import works seamlessly and is
fairly self-explanatory. It is, however, crucially important to define
descriptive fields (the fields aside from source and target) before you
import the external glossary file. You can usually do that by adding
additional columns or fields. While it's possible to add that data
later on, it's much more cumbersome. If you fail to enter descriptive
data, your terminology database will quickly become meaningless because
it will contain a lot of linguistic data without any description.
Also, it's important to remove multiple targets. We've all seen the
glossary that has one source term and several target terms separated by
commas. All of the target terms may be correct (see the above-mentioned
different meanings of "cat"), but they're a nuisance if you want to use
some of the features of your translation memory tool that allow you to
automatically enter the terms. What you need to do here is to create
several entries for one source term.
Lastly, when I talk about "terms" and "terminology databases,"
I'm not only referring to single words (like the above-mentioned
"cat"). Instead, I mean to include recurring compounds (such as
"computer assisted translation tool") and phrases (such as "depending
on the situation") as well.
And here's a controversial confession that will likely make enemies: I
work with some translation agencies that require the entering of terms
into a terminology database for each project. While this is a little
bit of extra work for me, I really like it because it forces the use of
terminology databases. In fact, I'd like to see more agencies follow
that lead. (And nothing speaks against some extra pay for entering
these terms. . . .)