| Subcribe via RSS 

Cat out of the bag: Data Exchange

September 30th, 2006 | by mm

From the The Tool Kit by Jost Zetzsche.

A reader recently asked the about translation memory exchange between different translation environment tools. Here is the most fundamental guideline.

At this point there are essentially two different exchange formats for translation memory files: the “official” exchange format TMX (Translation Memory eXchange) and the “unofficial” Trados text format (i.e., the format that Trados produces when you export a translation memory). The Trados text format has de facto become an exchange standard because so many tools are able to directly produce that format or import it (Wordfast, Déjà Vu, SDLX, MetaTexis, and others).

If both of the tools between which you would like to exchange translation memories support both formats, it may make sense to actually run some tests of which format produces better results. The major thing you need to look for are the “inline codes” (i.e., codes that occur inside translation units). Due to the different ways that different tools deal with inline codes, this has really become the Achilles heel for data exchange (see my article on this at www.multilingual.com/zetzsche54.htm).

Another thing you need to look out for is the code page. With version 7, Trados has started to produce Unicode text files when it exports databases. Other tools may not be able to handle that or don’t handle it by default. The work-around for this is to either convert the Trados databases to a non-Unicode code page or to use TMX.

The exchange of data between different systems is almost always possible, but it is never possible without a loss. Debbie Folaron and Philippe Mercierle wrote a very revealing article on this in a recent issue of Multilingual Computing where they demonstrated that even between different versions of one system (in that case, SDLX 3.5 and 2005), there was a loss of 13% when using the same translation memory!

Leave a Reply