Skip to main content
Log in

EXTRA: a system for example-based translation assistance

  • Published:
Machine Translation

Abstract

In this paper we present EXTRA (EXample-based TRanslation Assistant), a translation memory (TM) system. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, language-independent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System. The paper provides a thorough evaluation of both the effectiveness and the efficiency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, process-oriented, discrete-event models. As far as we know, this is the first time statistical simulation experiments have been used to face the nontrivial problem of evaluating TM systems, particularly for comparing the time that could be saved by performing assisted translation versus “manual” translation and for optimally tuning the system behaviour with respect to differently skilled users. In our experiments, we considered three scenarios, manual translation with one or two translators and assisted translation with one translator. The time needed for one translator to do an assisted translation is significantly closer to that of a team of two translators than to that of the single translator. The mean sentence translation time is by far the lowest for this scenario, corresponding to the highest per translator productivity. We also estimate the total translation time when the number of query sentences, the maximum number of suggestions to be read, and the probability of look up are varied: the best trade-off is given by reading (and presenting) four or five suggestions at the most.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal R, Faloutsos C and Swami AN (1993). Efficient similarity search in sequence databases. In: Lomet, DB (eds) Foundations of data organization and algorithms, 4th international conference, FODO’93, Chicago, Illinois, pp 69–84. Springer, Berlin

    Google Scholar 

  • Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: VLDB’95, proceedings of 21st international conference on very large data bases, Zurich, Switzerland, pp 490–501

  • Baeza-Yates RA, Gonnet GH (1999) A fast algorithm on average for all-against-all sequence matching. In: SPIRE, proceedings of the string processing and information retrieval symposium and international workshop on Groupware, Cancún, Mexico, pp 16–23

  • Baeza-Yates RA and Navarro G (2002). New and faster filters for multiple approximate string matching. Random Struct Algor 20: 23–49

    Article  Google Scholar 

  • Baeza-Yates R and Ribeiro-Neto B (1999). Modern information retrieval. Addison Wesley, New York

    Google Scholar 

  • Baldwin T, Tanaka H (2000) The effects of word order and segmentation on translation retrieval performance. In: The 18th international conference on computational linguistics, COLING 2000 in Europe, Saarbrücken, Germany, pp 35–40

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgements. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, MI, pp 65–72

  • Brown PF, Della Pietra SA, Della Pietra V and Mercer RL (1993). The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19: 263–311

    Google Scholar 

  • Brown RD (1996) Example-based machine translation in the Pangloss system. In: COLING-96, the 16th international conference on computational linguistics, Copenhagen, Denmark, pp 169–174

  • Callison-Burch C, Talbot D, Osborne M (2004) Statistical machine translation with word- and sentence-aligned parallel corpora. In: 42nd annual meeting of the association for computational linguistics, Barcelona, Spain, pp 175–182

  • Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering (ICDE ’99), Sydney, Australia, pp 126–133

  • Chávez E and Navarro G (2002). A metric index for approximate string matching. In: Rajsbaum, S (eds) LATIN 2002, theoretical informatics, 5th Latin American symposium, Cancún, Mexico, pp 181–195. Springer, Berlin, Germany

    Chapter  Google Scholar 

  • Cheng L, Cheung DW, Yiu S (2003) Approximate string matching in DNA sequences. In: 8th international conference on database systems for advanced applications DASFAA 2003, Kyoto, Japan, pp 303–310

  • Cobbs A (1995). Fast approximate matching using suffix trees. In: Galil, Z and Ukkonen, E (eds) Combinatorial pattern matching, 6th annual symposium on combinatorial pattern matching, Espoo, Finland, pp 41–54. Springer, Berlin, Germany

    Google Scholar 

  • Collins P and Cunningham P (1996). Adaptation guided retrieval in EBMT: A case-based approach to machine translation. In: Smith, I and Faltings, B (eds) Advances in case-based reasoning: third European workshop, EWCBR-96, Lausanne, Switzerland, pp 91–104. Springer, Berlin, Germany

    Google Scholar 

  • Cranias L, Papageorgiou H, Piperidis S (1994) A matching technique in example-based machine translation. In: COLING 94, The 15th international conference on computational linguistics, Kyoto, Japan, pp 100–104

  • Doi T, Sumita E, Yamamoto H (2003) Adaptation using out-of-domain corpus within EBMT. In: HLT-NAACL: Human language technology conference of the North American Chapter of the Association for Computational Linguistics Companion Volume: Short Papers, Student Research Workshop, Demonstrations, Tutorials Abstracts, Edmonton, Alberta, Canada, pp 16–18

  • Dorr B, Jordan P and Benoit J (1999). A survey of current research in machine translation. Adv Comput 49: 1–68

    Google Scholar 

  • Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, Minnesota, pp 419–429

  • Giegerich R, Hischke F, Kurtz S, Ohlebusch E (1997) A general technique to improve filter algorithms for approximate string matching. In: Proceedings of the fourth South American workshop on string processing, Valparaíso, Chile, pp 38–52

  • Gotti F, Langlais P, Macklovitch E, Bourigault D, Robichaud B, Coulombe C (2005) 3GTM: A third-generation translation memory. In: CLiNE 05, 3rd Computational Linguistics in the North-East Workshop, Gatineau, Québec, Canada, pp 8–15

  • Gravano L, Ipeirotis PG, Jagadish HV, Koudas N, Muthukrishnan S, Srivastava D (2001) Approximate string joins in a database (almost) for free. In: Proceedings of 27th international conference on very large data bases, Roma, Italy, pp 491–500

  • Hyyro H, Fredriksson K and Navarro G (2004). Increased bit-parallelism for approximate string matching. In: Ribeiro, CC and Martins, SL (eds) Experimental and efficient algorithms: third international workshop, WEA 2004, Angra dos Reis, Brazil , pp 285–298. Springer, Berlin, Germany

    Google Scholar 

  • Kahveci T, Singh AK (2001) Variable length queries for time series data. In: Proceedings of the 17th international conference on data engineering (ICDE 2001), Heidelberg, Germany, pp 273–282

  • Leplus T, Langlais P and Lapalme G (2004). Weather report translation using a translation memory. In: Frederking, RE and Taylor, KB (eds) Machine translation: from real users to research, 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, pp 154–163. Springer, Berlin, Germany

    Google Scholar 

  • Levenshtein VI [Левенштеѵн, ΒИ] (1965) ДвоичньӀе кодьӀ с исправлением вьӀпадениѵ вставок и замешениѵ симболов. докл Акад Наук СССР 163, 845–848; appeared as Binary codes capable of correcting deletions, insertions and reversals, Sov Phys Dokl 10 (1966), 707–710

    Google Scholar 

  • Mandreoli F, Martoglia R, Tiberio P (2002a) Searching similar (sub)sentences for example based machine translation. In: Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD 2002), Portoferraio, Isola d’Elba, Italy, pp 208–221

  • Mandreoli F, Martoglia R, Tiberio P (2002b) A syntactic approach for searching similarities within sentences. In: CIKM 2002, Eleventh international conference of information and knowledge management, McLean, VA, pp 635–637

  • Mandreoli F, Martoglia R, Tiberio P (2003) Exploiting multi-lingual text potentialities in EBMT systems. In: RIDE MLIM 2003, 13th international workshop on research issues on data engineering: multi-lingual information management, Hyderabad, India, pp 9–15

  • Melamed ID (1995) Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the third workshop on very large Corpora, Cambridge, Massachusetts, pp 184–198

  • Mihov S and Schulz KU (2004). Fast approximate search in large dictionaries. Comput Ling 30: 451–477

    Article  Google Scholar 

  • Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence (Edited review papers presented at the international NATO symposium on artificial and human intelligence), North-Holland, Amsterdam, The Netherlands, pp 173–180; repr. in Nirenburg S, Somers H, Wilks Y (eds) Readings in machine translation. MIT Press, Cambridge, MA (2003), pp 351–354

  • Navarro G (2001). A guided tour to approximate string matching. ACM Comput Surv 33: 31–88

    Article  Google Scholar 

  • Navarro G and Baeza-Yates R (1999). Very fast and simple approximate string matching. Inform Proc Lett 72: 65–70

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318

  • Planas E, Furuse O (2000) Multi-level similar segment matching algorithm for translation memories and example-based machine translation. In: The 18th international conference on computational linguistics, COLING 2000 in Europe, Saarbrücken, Germany, pp 621–627

  • Sato S, Nagao M (1990) Toward memory-based translation. In: COLING-90, Papers presented to the 13th international conference on computational linguistics, Helsinki, Finland, vol 3, pp 247–252

  • Simard M, Langlais P (2001) Sub-sentential exploitation of translation memories. In: MT Summit VIII, Machine translation in the information age, Santiago de Compostela, Spain, pp 335–339

  • Somers H (1999). Review article: example-based machine translation. Mach Translat 14: 113–157

    Article  Google Scholar 

  • Sumita E, Iida H (1991) Experiments and prospects of example-based machine translation. In: 29th annual meeting of the association for computational linguistics, Berkeley, California, pp 185–192

  • Sutinen E and Tarhio J (1995). On using q-gram locations in approximate string matching. In: Spirakis, PG (eds) Algorithms—ESA ’95, third annual European symposium, Corfu, Greece, pp 327–340. Springer, Berlin, Germany

    Google Scholar 

  • Sutinen E and Tarhio J (1996). Filtration with q-samples in approximate string matching. In: Hirschberg, DS and Myers, EW (eds) Combinatorial pattern matching, 7th annual symposium, CPM 96, Laguna Beach, California, pp 50–63. Springer, Berlin, Germany

    Google Scholar 

  • Whyman EK and Somers HL (1999). Evaluation metrics for a translation memory system. Softw Prac Exp 29: 1265–1284

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riccardo Martoglia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandreoli, F., Martoglia, R. & Tiberio, P. EXTRA: a system for example-based translation assistance. Machine Translation 20, 167–197 (2006). https://doi.org/10.1007/s10590-007-9023-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-007-9023-0

Keywords

Navigation