Home Home Home
Home
DomůSouvislosti tématInformace a; službyNastaveníNápověda
Hide Panels
Přihlásit uživatele

Jméno uživatele

Heslo
Pro nápovědu klikněte zde
Jazyk stránek
CSCS
Forums
You are currently browsing as a guest. Please log on to access more features.
Moderators
Maxi Schwarz-Bastami, Nikita Kobrin
Message format
Thread information
Last Activity 2/9/2012 22:13

3 replies
3768 viewings

Site Search
Notification

Toggle e-mail notification

XML RSS Feed
Recommend Us
 del.icio.us facebook
Legend
Posted Messages:
5000 5000
2000 2000
1000 1000
500 500
100 100
25 25
Colour Coding:
  • Administrator
  • Forum Moderator
  • Registered User
Top Contributors
Past Month

L C (17)
Most Popular Threads
Past three months

Unpaid internship: shameful slavery or invaluable experience? 61

Ridiculous job offers 35

One mistake and you’re doomed?? 10

Removal of Jobs Post - Lack of contact from moderators 6

Ridiculous Jobs 6

Per favore, qualcuno è disposto a farmi una revisione?? 6

Working Pro-bono for agencies 6



Past three years

Ridiculous job offers 147

Unpaid internship: shameful slavery or invaluable experience? 61

Translating into your second language.­. A serious taboo? 25

The tag "Urgent Job" and the impression it gives about an agency 24

Is it important for a translator to have a degree in translation? 19

Payment by a counterfeit cheque 17

Proofreading not paid from an agency after bad translation 16

Most Popular Messages
Past three months

RE: free internship as "job offer" 4

RE: Unpaid internship: shameful slavery or invaluable experience? 4

RE: Unpaid internship: shameful slavery or invaluable experience? 4

RE: Unpaid internship: shameful slavery or invaluable experience? 4

RE: Unpaid internship: shameful slavery or invaluable experience? 3



Past three years

Top 10 things I have learned as a freelance translator 6

RE: Ridiculous job offers 5

RE: belittling, insulting, and verbal abuse 5

The tag "Urgent Job" and the impression it gives about an agency 4

RE: belittling, insulting, and verbal abuse (OT) 4

Who’s Posting Jobs on TranslatorsCafe.com
Remember, the entrance door to the sanctuary is inside you.Rumi
Page: 1
Back
« Thread »
Posted:
22. listopadu 2009 20:21 GMT
Post #189991
+0-0
John Espi
New User

Posts: 3
Joined: 19. listopadu 2009
Location: USA
 
JRC-ACQUIS Multilingual Parallel Corpus

I recently downloaded the JRC-ACQUIS Multilingual Parallel Corpus from http://wt.jrc.it/lt/Acquis/ because I do a lot of translation that deals with EU matters.

Has anyone been able to import this into a TM?

 

I followed the directions on the website, and get a giant XML file that looks like:

<link type="1:1" xtargets="3;3">
<s1>THE COUNCIL OF THE EUROPEAN COMMUNITIES,</s1>
<s2>LE CONSEIL DES COMMUNAUTÉS EUROPÉENNES,</s2>
</link>
<link type="1:1" xtargets="4;4">
<s1>of the one part, and</s1>
<s2>d'une part,</s2>
</link>
<link type="1:1" xtargets="5;5">
<s1>THE SWISS FEDERAL COUNCIL,</s1>
<s2>LE CONSEIL FÉDÉRAL SUISSE,</s2>
</link>

 

I have Deja Vu, but am not sure how to align it since the source and target are all in the same file.  I also have OmegaT installed as well.

Really, any advice would be appreciated...

 

Thanks,

John


Reply |Quote |Edit
Posted:
23. listopadu 2009 4:24 GMT
Post #189999—in reply to #189991
+0-0
Dragomir Kovačević
Photo
Regular
252525
Mother tongues: srbština, chorvatština
Posts: 77
Joined: 17. března 2004
Location: Itálie
 
RE: JRC-ACQUIS Multilingual Parallel Corpus
John, when you download a packet from Acquis site, there should be the TM editor, an exe file inside. After decompression of all the pack, you activate the exe (clicking onto it, and choose your pairs. The output format is a perfect TM; I don't remember now, whether in TMX format, or TXT. I guess, already in TMX. The encoding can be adjusted later, if it is UTF-8 per default, then it is perfect for OmegaT. To eliminate some probable internal tags in TMX (tags in segments), try using some of the opensource tools, like Okapi Olifant. In that way, you will obtain a leaner flat text. But I doubt there might be some extra formatting tags in Acquis tm-s.

Reply |Quote |Edit |Delete
Posted:
23. listopadu 2009 19:19 GMT
Post #190074—in reply to #189991
+0-0
John Espi
New User

Posts: 3
Joined: 19. listopadu 2009
Location: USA
 
RE: JRC-ACQUIS Multilingual Parallel Corpus

Thanks Dragomir...

 

After looking a bit more into this, there are two sets of data over there... one is the TMX one and the other is the one I was referring to.

 

I went with this choice because the page @ http://langtech.jrc.it/JRC-Acquis.html said:

 

What is the difference between the DGT Translation Memory and the JRC-Acquis

The two resources are rather similar in nature as they are both based on the Acquis Communautaire, but they are not identical and can both serve different purposes. The main differences are the following:

  • The collection of documents of both resources should mostly be the same, but they are not identical as both resources were collected in different ways. None of the resources is exactly equivalent to the Acquis Communautaire. The criteria for the collection of the JRC-Acquis were rather loose (all documents were collected which were available in at least ten languages of which at least three 'new' EU languages) so that the JRC-Acquis is bigger.

 

So that being said, that automated tool seems only to work with the DGT TM.

 

I've been able to use a script to get the terms into a TMX format with some ugly hacking.  I also have it so that there are two files, one source, one target that has a term on each line.

 

The problem is that no TM tool I find can deal with a 100MB text file.  Deja Vu crashes right away. Stingray just runs forever.  I killed it after 3 hours.

I looked at OmegaT, but that too crashes.

And Oliphant looks nice, but I cannot for the life of me, figure out how to import two aligned files.

Does anyone know of a tool that can handle large text files for importing?

Thanks,
John


Reply |Quote |Edit |Delete
Posted:
24. listopadu 2009 4:00 GMT
Post #190088—in reply to #190074
+0-0
Didier Briel
Photo
Member
25
Mother tongue: francouzština
Posts: 35
1
Joined: 10. dubna 2005
Location: Francie
 
RE: JRC-ACQUIS Multilingual Parallel Corpus

Originally written by John Espi on November 23, 2009 7:19 PMThe problem is that no TM tool I find can deal with a 100MB text file.  Deja Vu crashes right away. Stingray just runs forever.  I killed it after 3 hours.

I looked at OmegaT, but that too crashes.

Normally, OmegaT should handle a 100 MB TMX.

What memory had you allocated to OmegaT (and under which operating system)?

Didier

 


Reply |Quote |Edit |Delete
Page: 1
Back
« Thread »
Domů | Fóra | Hledat
Nedávné souvislosti témat | Dnes | Tento týden | Prvních 25
Statistické údaje o fóru | Kdo je online | Náhodné nabídky
New TC Mobile | Nastavení fóra | Přihlásit se
TranslatorsCafé.com

Jazyk stránek English | Spanish – Español | French – Français | Italian – Italiano | Další... | Mapa stránek

Copyright © ANVICA Software Development 2002—2012. Všechna práva vyhrazena.
Zásady ochrany osobních údajů. Smluvní podmínky použití. Použití vyjadřuje váš souhlas.
Vaše připomínky a návrhy zasílejte e-mailem webmasterovi TranslatorsCafe.com
Adresář překladatelů, tlumočníků a překladatelských agentur.

Odmítnutí odpovědnosti za fóra: Názory vyjádřené na fórech jsou názory autorů a nemusí se vždy shodovat s názory vlastníka webových stránek a/nebo moderátorů. Považuje-li čtenář zveřejněný příspěvek za urážlivý, měl by uplatnit stížnost u moderátora daného fóra. Stížnost by měla být projednána do 24 hodin, ale respektujte prosím skutečnost, že moderátor může žít v odlišném časovém pásmu. Použití fóra vyjadřuje váš souhlas s Pravidly pro zveřejňování příspěvků na fórech.