Abstract
In this paper we present EXTRA (EXample-based TRanslation Assistant), a translation memory (TM) system. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, language-independent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System. The paper provides a thorough evaluation of both the effectiveness and the efficiency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, process-oriented, discrete-event models. As far as we know, this is the first time statistical simulation experiments have been used to face the nontrivial problem of evaluating TM systems, particularly for comparing the time that could be saved by performing assisted translation versus “manual” translation and for optimally tuning the system behaviour with respect to differently skilled users. In our experiments, we considered three scenarios, manual translation with one or two translators and assisted translation with one translator. The time needed for one translator to do an assisted translation is significantly closer to that of a team of two translators than to that of the single translator. The mean sentence translation time is by far the lowest for this scenario, corresponding to the highest per translator productivity. We also estimate the total translation time when the number of query sentences, the maximum number of suggestions to be read, and the probability of look up are varied: the best trade-off is given by reading (and presenting) four or five suggestions at the most.
Similar content being viewed by others
References
Agrawal R, Faloutsos C and Swami AN (1993). Efficient similarity search in sequence databases. In: Lomet, DB (eds) Foundations of data organization and algorithms, 4th international conference, FODO’93, Chicago, Illinois, pp 69–84. Springer, Berlin
Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: VLDB’95, proceedings of 21st international conference on very large data bases, Zurich, Switzerland, pp 490–501
Baeza-Yates RA, Gonnet GH (1999) A fast algorithm on average for all-against-all sequence matching. In: SPIRE, proceedings of the string processing and information retrieval symposium and international workshop on Groupware, Cancún, Mexico, pp 16–23
Baeza-Yates RA and Navarro G (2002). New and faster filters for multiple approximate string matching. Random Struct Algor 20: 23–49
Baeza-Yates R and Ribeiro-Neto B (1999). Modern information retrieval. Addison Wesley, New York
Baldwin T, Tanaka H (2000) The effects of word order and segmentation on translation retrieval performance. In: The 18th international conference on computational linguistics, COLING 2000 in Europe, Saarbrücken, Germany, pp 35–40
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgements. In: Intrinsic and extrinsic evaluation measures for MT and/or summarization, proceedings of the ACL-05 workshop, Ann Arbor, MI, pp 65–72
Brown PF, Della Pietra SA, Della Pietra V and Mercer RL (1993). The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19: 263–311
Brown RD (1996) Example-based machine translation in the Pangloss system. In: COLING-96, the 16th international conference on computational linguistics, Copenhagen, Denmark, pp 169–174
Callison-Burch C, Talbot D, Osborne M (2004) Statistical machine translation with word- and sentence-aligned parallel corpora. In: 42nd annual meeting of the association for computational linguistics, Barcelona, Spain, pp 175–182
Chan KP, Fu AWC (1999) Efficient time series matching by wavelets. In: Proceedings of the 15th international conference on data engineering (ICDE ’99), Sydney, Australia, pp 126–133
Chávez E and Navarro G (2002). A metric index for approximate string matching. In: Rajsbaum, S (eds) LATIN 2002, theoretical informatics, 5th Latin American symposium, Cancún, Mexico, pp 181–195. Springer, Berlin, Germany
Cheng L, Cheung DW, Yiu S (2003) Approximate string matching in DNA sequences. In: 8th international conference on database systems for advanced applications DASFAA 2003, Kyoto, Japan, pp 303–310
Cobbs A (1995). Fast approximate matching using suffix trees. In: Galil, Z and Ukkonen, E (eds) Combinatorial pattern matching, 6th annual symposium on combinatorial pattern matching, Espoo, Finland, pp 41–54. Springer, Berlin, Germany
Collins P and Cunningham P (1996). Adaptation guided retrieval in EBMT: A case-based approach to machine translation. In: Smith, I and Faltings, B (eds) Advances in case-based reasoning: third European workshop, EWCBR-96, Lausanne, Switzerland, pp 91–104. Springer, Berlin, Germany
Cranias L, Papageorgiou H, Piperidis S (1994) A matching technique in example-based machine translation. In: COLING 94, The 15th international conference on computational linguistics, Kyoto, Japan, pp 100–104
Doi T, Sumita E, Yamamoto H (2003) Adaptation using out-of-domain corpus within EBMT. In: HLT-NAACL: Human language technology conference of the North American Chapter of the Association for Computational Linguistics Companion Volume: Short Papers, Student Research Workshop, Demonstrations, Tutorials Abstracts, Edmonton, Alberta, Canada, pp 16–18
Dorr B, Jordan P and Benoit J (1999). A survey of current research in machine translation. Adv Comput 49: 1–68
Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on management of data, Minneapolis, Minnesota, pp 419–429
Giegerich R, Hischke F, Kurtz S, Ohlebusch E (1997) A general technique to improve filter algorithms for approximate string matching. In: Proceedings of the fourth South American workshop on string processing, Valparaíso, Chile, pp 38–52
Gotti F, Langlais P, Macklovitch E, Bourigault D, Robichaud B, Coulombe C (2005) 3GTM: A third-generation translation memory. In: CLiNE 05, 3rd Computational Linguistics in the North-East Workshop, Gatineau, Québec, Canada, pp 8–15
Gravano L, Ipeirotis PG, Jagadish HV, Koudas N, Muthukrishnan S, Srivastava D (2001) Approximate string joins in a database (almost) for free. In: Proceedings of 27th international conference on very large data bases, Roma, Italy, pp 491–500
Hyyro H, Fredriksson K and Navarro G (2004). Increased bit-parallelism for approximate string matching. In: Ribeiro, CC and Martins, SL (eds) Experimental and efficient algorithms: third international workshop, WEA 2004, Angra dos Reis, Brazil , pp 285–298. Springer, Berlin, Germany
Kahveci T, Singh AK (2001) Variable length queries for time series data. In: Proceedings of the 17th international conference on data engineering (ICDE 2001), Heidelberg, Germany, pp 273–282
Leplus T, Langlais P and Lapalme G (2004). Weather report translation using a translation memory. In: Frederking, RE and Taylor, KB (eds) Machine translation: from real users to research, 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, pp 154–163. Springer, Berlin, Germany
Levenshtein VI [Левенштеѵн, ΒИ] (1965) ДвоичньӀе кодьӀ с исправлением вьӀпадениѵ вставок и замешениѵ симболов. докл Акад Наук СССР 163, 845–848; appeared as Binary codes capable of correcting deletions, insertions and reversals, Sov Phys Dokl 10 (1966), 707–710
Mandreoli F, Martoglia R, Tiberio P (2002a) Searching similar (sub)sentences for example based machine translation. In: Decimo Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD 2002), Portoferraio, Isola d’Elba, Italy, pp 208–221
Mandreoli F, Martoglia R, Tiberio P (2002b) A syntactic approach for searching similarities within sentences. In: CIKM 2002, Eleventh international conference of information and knowledge management, McLean, VA, pp 635–637
Mandreoli F, Martoglia R, Tiberio P (2003) Exploiting multi-lingual text potentialities in EBMT systems. In: RIDE MLIM 2003, 13th international workshop on research issues on data engineering: multi-lingual information management, Hyderabad, India, pp 9–15
Melamed ID (1995) Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the third workshop on very large Corpora, Cambridge, Massachusetts, pp 184–198
Mihov S and Schulz KU (2004). Fast approximate search in large dictionaries. Comput Ling 30: 451–477
Nagao M (1984) A framework of a mechanical translation between Japanese and English by analogy principle. In: Elithorn A, Banerji R (eds) Artificial and human intelligence (Edited review papers presented at the international NATO symposium on artificial and human intelligence), North-Holland, Amsterdam, The Netherlands, pp 173–180; repr. in Nirenburg S, Somers H, Wilks Y (eds) Readings in machine translation. MIT Press, Cambridge, MA (2003), pp 351–354
Navarro G (2001). A guided tour to approximate string matching. ACM Comput Surv 33: 31–88
Navarro G and Baeza-Yates R (1999). Very fast and simple approximate string matching. Inform Proc Lett 72: 65–70
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics, Philadelphia, Pennsylvania, pp 311–318
Planas E, Furuse O (2000) Multi-level similar segment matching algorithm for translation memories and example-based machine translation. In: The 18th international conference on computational linguistics, COLING 2000 in Europe, Saarbrücken, Germany, pp 621–627
Sato S, Nagao M (1990) Toward memory-based translation. In: COLING-90, Papers presented to the 13th international conference on computational linguistics, Helsinki, Finland, vol 3, pp 247–252
Simard M, Langlais P (2001) Sub-sentential exploitation of translation memories. In: MT Summit VIII, Machine translation in the information age, Santiago de Compostela, Spain, pp 335–339
Somers H (1999). Review article: example-based machine translation. Mach Translat 14: 113–157
Sumita E, Iida H (1991) Experiments and prospects of example-based machine translation. In: 29th annual meeting of the association for computational linguistics, Berkeley, California, pp 185–192
Sutinen E and Tarhio J (1995). On using q-gram locations in approximate string matching. In: Spirakis, PG (eds) Algorithms—ESA ’95, third annual European symposium, Corfu, Greece, pp 327–340. Springer, Berlin, Germany
Sutinen E and Tarhio J (1996). Filtration with q-samples in approximate string matching. In: Hirschberg, DS and Myers, EW (eds) Combinatorial pattern matching, 7th annual symposium, CPM 96, Laguna Beach, California, pp 50–63. Springer, Berlin, Germany
Whyman EK and Somers HL (1999). Evaluation metrics for a translation memory system. Softw Prac Exp 29: 1265–1284
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mandreoli, F., Martoglia, R. & Tiberio, P. EXTRA: a system for example-based translation assistance. Machine Translation 20, 167–197 (2006). https://doi.org/10.1007/s10590-007-9023-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-007-9023-0