Translation memory: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:33, 7 April 2020 edit Iridescent (talk \| contribs) Administrators 402,626 edits m →Further reading: Cleanup and typo fixing, typo(s) fixed: 123-136 → 123–136 (2) Tag: AWB ← Previous edit		Latest revision as of 20:08, 2 November 2024 edit undo LR.127 (talk \| contribs) Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers 14,689 edits Adding local short description: "Segmented database to aid translators", overriding Wikidata description "database storing units of language to assist the translation process" Tag: Shortdesc helper
(28 intermediate revisions by 23 users not shown)
Line 1: {{Short description\|Segmented database to aid translators}} {{citations missing\|date=November 2022}} A '''translation memory''' ('''TM''') is a database that stores "segments", which can be sentences, paragraphs or sentence-like units (headings, titles or elements in a list) that have previously been translated, in order to aid human [[Translation\|translators]]. The translation memory stores the [[source text]] and its corresponding translation in language pairs called “translation units”. Individual words are handled by terminology bases and are not within the domain of TM. Software programs that use translation memories are sometimes known as '''translation memory managers''' ('''TMM''') or '''translation memory systems''' ('''TM systems''', not to be confused with a [[~~Translation~~translation management system]] ('''TMS'''), which is another type of software focused on managing the process of translation). Translation memories are typically used in conjunction with a dedicated [[computer -assisted translation]] (CAT) tool, [[wordprocessor\|word processing]] program, [[terminology management systems]], multilingual dictionary, or even raw [[machine translation]] output. Research indicates that many [[language industry\|companies producing multilingual documentation]] are using translation memory systems. In a survey of language professionals in 2006, 82.5% out of 874 replies confirmed the use of a TM.<ref name="survey"/> Usage of TM correlated with text type characterised by technical terms and simple sentence structure (technical, to a lesser degree marketing and financial), computing skills, and repetitiveness of content.<ref name="survey">Elina Lagoudaki (2006), "Translation Memory systems: Enlightening users' perspective. Key finding of the TM Survey 2006 carried out during July and August 2006. (Imperial College London, Translation Memories Survey 2006), p.16 {{cite web \|url=http://www3.imperial.ac.uk/portal/pls/portallive/docs/1/7307707.PDF \|title=Archived copy \|~~accessdate~~access-date=2007-03-25 \|url-status=dead \|~~archiveurl~~archive-url=https://web.archive.org/web/20070325114619/http://www3.imperial.ac.uk/portal/pls/portallive/docs/1/7307707.PDF \|~~archivedate~~archive-date=2007-03-25 }}</ref> == Using ~~translation memories~~TMs == The program breaks the '''[[source text]]''' (the text to be translated) into segments, looks for matches between segments and the source half of previously translated source-target pairs stored in a '''translation memory''', and presents such matching pairs as translation full and partial '''~~candidates~~matches'''. The translator can accept a ~~candidate~~match, replace it with a fresh translation, or modify it to match the source. In the last two cases, the new or modified translation goes into the database. Some translation memory systems search for 100% matches only, ~~that is to say that~~i.e. they can only retrieve segments of text that match entries in the database exactly, while others employ [[Fuzzy string searching\|fuzzy matching]] algorithms to retrieve similar segments, which are presented to the translator with differences flagged. ~~It is important to note that typical~~Typical translation memory systems only search for text in the source segment. The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the 100%-match approach. Line 29 ⟶ 32: === Main obstacles === {{~~cite-~~more citations needed section\|date=April 2018}} The main problems hindering wider use of translation memory managers include: Line 43 ⟶ 46: === Effects on quality === The use of TM systems might have an effect on the quality of the texts translated. Its main effect is clearly related to the so-called "error propagation": if the translation for a particular segment is incorrect, it is in fact more likely that the incorrect translation will be reused the next time the same [[source text]], or a similar [[source text]], is translated, thereby perpetuating the error. Traditionally, two main effects on the quality of translated texts have been described: the "sentence-salad" effect (Bédard 2000; cited in ~~O’Hagan~~O'Hagan 2009: 50) and the "peep-hole" effect (Heyn 1998). The first refers to a lack of coherence at the text level when a text is translated using sentences from a TM which have been translated by different translators with different styles. According to the latter, translators may adapt their style to the use of TM system in order for these not to contain intratextual references, so that the segments can be better reused in future texts, thus affecting cohesion and readability (O'Hagan 2009). There is a potential, and, if present, probably an unconscious effect on the translated text. Different languages use different sequences for the logical elements within a sentence and a translator presented with a multiple clause sentence that is half translated is less likely to completely rebuild a sentence. Consistent empirical evidences (Martín-Mor 2011) show that translators will most likely modify the structure of a multiple clause sentence when working with a text processor rather than with a TM system. Line 49 ⟶ 52: There is also a potential for the translator to deal with the text mechanically sentence-by-sentence, instead of focusing on how each sentence relates to those around it and to the text as a whole. Researchers (Dragsted 2004) have identified this effect, which relates to the automatic segmentation feature of these programs, but it does not necessarily have a negative effect on the quality of translations. ~~Note that these~~These effects are closely related to training rather than inherent to the tool. According to Martín-Mor (2011), the use of TM systems does have an effect on the quality of the translated texts, especially on novices, but experienced translators are able to avoid it. Pym (2013) reminds that "translators using TM/MT tend to revise each segment as they go along, allowing little time for a final revision of the whole text at the end", which might ~~in fact~~ be the ultimate cause of some of the effects described here. ==Types of ~~translation-memory~~TM systems== * Desktop: Desktop translation memory tools are typically what individual translators use to complete translations. They are programs that a freelance translator downloads and installs on ~~his/her~~a desktop computer. * Server-based or ~~Centralised~~centralised: Centralized translation memory systems store TM on a central server. They work together with desktop TM and can increase TM match rates by 30–60% more than the TM leverage attained by desktop TM alone. ==Functions== The following is a summary of the main functions of a translation memory. === ~~Off-line~~Offline functions === ==== Import ==== Line 110 ⟶ 113: ==History== 1970s is the infancy stage for TM systems in which scholars carried on a preliminary round of exploratory discussions. The original idea for TM systems is often attributed{{according to whom\|date=April 2018}} to Martin Kay's "Proper Place" paper ,<ref>{{cite journal \|last1=Kay \|first1=Martin \|title=The Proper Place of Men and Machines in Language Translation \|journal=Machine Translation \|date=March 1997 \|volume=12 \|issue=1–2 \|pages=3–23 \|doi=10.1023/A:1007911416676 \|s2cid=207627954 }}</ref>, but the details of it are not fully given. In this paper, it has shown the basic concept of the storing system: "The translator might start by issuing a command causing the system to display anything in the store that might be relevant to .... Before going on, he can examine past and future fragments of text that contain similar material". This observation from Kay was actually influenced by the suggestion of Peter Arthern that translators can use similar, already translated documents online. In his 1978 article <ref>{{cite journal\|title=Machine Translation and Computerized Terminology Systems: A Translator's Perspective \|last1=Arthern \|first1=Peter \|journal=Translating and the ~~computer~~Computer: ~~proceedings~~Proceedings of a ~~seminar~~Seminar, London, 14th November, 1978 \|date=1978 \|url=http://www.mt-archive.info/Aslib-1978-Arthern.pdf \|isbn=0444853022}}</ref> he gave fully demonstration of what we call TM systems today: Any new text would be typed into a word processing station, and as it was being typed, the system would check this text against the earlier texts stored in its memory, together with its translation into all the other official languages [of the European Community]. ... One advantage over machine translation proper would be that all the passages so retrieved would be grammatically correct. In effect, we should be operating an electronic 'cut and stick' process which would, according to my calculations, save at least 15 per cent of the time which translators now employ in effectively producing translations. The idea was incorporated from ALPS (Automated Language Processing Systems) Tools first developed by researcher from Brigham Young University, and at that time the idea of TM systems was mixed with a tool ~~call~~called "Repetitions Processing" which only aimed to find matched strings. Only after a long time, did the concept of so-called translation memory come into being. The real exploratory stage of TM systems would be 1980s. One of the first ~~implementation~~implementations of TM system appeared in Sadler and Vendelmans' Bilingual Knowledge Bank. A Bilingual Knowledge Bank is a syntactically and referentially structured pair of corpora, one being a translation of the other, in which translation units are cross-coded between the corpora. The aim of Bilingual Knowledge Bank is to develop a corpus-based general-purpose knowledge source for applications in machine translation and computer-aided translation (Sadler & Vendelman, 1987). Another important step was made by Brian Harris with his "Bi-text". He has defined the bi-text as "a single text in two dimensions" (1988), the source and target texts related by the activity of the translator through translation units which made a similar echoes with Sadler's Bilingual Knowledge Bank. And in Harris's work he proposed something like TM system without using this name: a database of paired translations, searchable either by individual word, or by "whole translation unit", in the latter case the search being allowed to retrieve similar rather than identical units. TM technology only became commercially available on a wide scale in the late 1990s, sothrough the efforts made by several engineers and translators. Of note is the first TM tool called Trados ([[Trados\|SDL Trados]] nowadays). In this tool, when opening the source file and applying the translation memory so that any "100% matches" (identical matches) or "fuzzy matches" (similar, but not identical matches) within the text are instantly extracted and placed within the target file. Then, the "matches" suggested by the translation memory can be either accepted or overridden with new alternatives. If a translation unit is manually updated, then it is stored within the translation memory for future use as well as for repetition in the current text. In a similar way, all segments in the target file without a "match" would be translated manually and then automatically added to the translation memory. In the 2000s, online translation services began incorporating TM. Machine translation services like [[Google Translate]], as well as professional and "hybrid" translation services provided by sites like [[Gengo]] and [[Translation Cloud#Ackuna\|Ackuna]], incorporate databases of TM data supplied by translators and volunteers to make more efficient connections between languages provide faster translation services to end-users.<ref>[https://techcrunch.com/2016/11/22/googles-ai-translation-tool-seems-to-have-invented-its-own-secret-internal-language/ Google's AI translation tool seems to have invented its own secret internal language] Devin Coldewey, TechCrunch, November 22, 2016</ref> Line 149 ⟶ 152: ===XLIFF=== '''[[XLIFF\|XML Localisation Interchange File Format]]''' (XLIFF) is intended to provide a single interchange file format that can be understood by any localization provider. [[XLIFF]] is the preferred way<ref>{{Cite web\|title=DITA Translation SC {{!}} OASIS\|url=https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita-translation\|access-date=2021-01-29\|website=www.oasis-open.org}}</ref><ref>{{Citation\|last=Roturier\|first=Johann\|title=XML ~~needed~~for translation technology\|date=~~March~~2019-08-23\|url=https://www.taylorfrancis.com/books/9781315311241/chapters/10.4324/9781315311258-3\|work=The ~~2009~~Routledge Handbook of Translation and Technology\|pages=45–60\|editor-last=O’Hagan\|editor-first=Minako\|edition=1\|location=Abingdon, Oxon\|publisher=Routledge\|language=en\|doi=10.4324/9781315311258-3\|isbn=978-1-315-31125-8\|s2cid=213287381\|access-date=2021-01-29}}</ref> of exchanging data in XML format in the translation industry.<ref>[http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff XML Localisation Interchange File Format] </ref> Line 157 ⟶ 160: ===xml:tm=== {{Main article\|xml:tm}} The xml:tm (XML-based Text Memory) approach to translation memory is based on the concept of text memory which comprises author and translation memory.<ref>{{cite web\|title=OAXAL—What is it and why should I care\|url=http://www.infomanagementcenter.com/enewsletter/200808/second.htm\|work=CIDM Information Management News\|~~accessdate~~access-date=March 30, 2013\|author=Andrzej Zydroń\|date=August 2008\|quote=At the core of xml:tm are the following concepts which together make up 'Text Memory': Author Memory and Translation Memory.\|archive-url=https://web.archive.org/web/20130517222230/http://www.infomanagementcenter.com/enewsletter/200808/second.htm\|archive-date=May 17, 2013\|url-status=dead\|df=mdy-all}}</ref> xml:tm has been donated to Lisa OSCAR by XML-INTL. ===PO=== Line 187 ⟶ 190: * [https://web.archive.org/web/20050123123221/http://ecolore.leeds.ac.uk/downloads/2003.05_bdue_survey_analysis.doc Ecolore survey of TM use by freelance translators (Word document)] * [https://doi.org/10.1007%2Fs10590-008-9033-6 Power Shifts in Web-Based Translation Memory] [[Category:Computer-assisted translation]] [[Category:Translation databases]]