Skip to main content
Log in

Toward Practical Spoken Language Translation

  • Original Article
  • Published:
Machine Translation

Abstract

This paper argues that the time is now right to field practical Spoken Language Translation (SLT) systems. Several sorts of practical systems can be built over the next few years if system builders recognize that, at the present state of the art, users must cooperate and compromise with the programs. Further, SLT systems can be arranged on a scale, in terms of the degree of cooperation or compromise they require from users. In general, the broader the intended linguistic or topical coverage of a system, the more user cooperation or compromise it will presently require. The paper briefly discusses the component technologies of SLT systems as they relate to user cooperation and accommodation (“human factors engineering”), with examples from the authors’ work. It describes three classes of “cooperative” SLT systems which could be put into practical use during the next few years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akiba Y, Watanabe T, Sumita E (2002) Using language and translation models to select the best among outputs from multiple MT systems. In: COLING 2002: Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp 8–14

  • Akiba Y, Federico M, Kando N, Nakaiwa H, Paul M, Tsuji J (2004), Overview of the IWSLT04 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation. (IWSLT), Kyoto, Japan, pp 1–12

  • Alshawi H (1996) Head automata for speech translation. In: ICSLP 96: Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, pp 2360–2364

  • Alshawi H, Bangalore S, Douglas S (2000) Head-transducer models for speech translation and their automatic acquisition from bilingual data. Mach Trans 15:105–124

    Article  Google Scholar 

  • Arnold DJ, Balkan L, Meijer S, Lee Humphreys R, Sadler L (1994) Machine translation: an introductory guide. Blackwells-NCC, London; available online at www.essex.ac.uk/linguistics/clmt/MTbook/PostScript

  • Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Association for Computational Linguistics 39th Annual Meeting and 10th Conference of the European Chapter, Toulouse, France, pp 26–33

  • Bender O, Zens R, Matusov E, Ney H (2004) Alignment templates: the RWTH SMT system. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Kyoto, Japan, pp 79–84

  • Blanchon H (1996) A customizable interactive disambiguation methodology and two implementations to disambiguate French and English input. In: Proceedings of MIDDIM-96, International Seminar on Multimodal Interactive Disambiguation, Col de Porte, France, pp 190–200

  • Boitet C (1996a) Dialogue-based machine translation for monolinguals and future self-explaining documents. In: Proceedings of MIDDIM-96, International Seminar on Multimodal Interactive Disambiguation, Col de Porte, France, pp 75–85

  • Boitet C (1996b) Machine-aided human translation. In: Cole R, Mariani J, Uszkoreit H, Varile GB, Zaenen A, Zampolli A, Zue V (eds) Survey of the state of the art in human language technology. Cambridge University Press, Cambridge, UK, pp 257–260

    Google Scholar 

  • Boitet C, Blanchon H (1995) Multilingual dialogue-based MT for monolingual authors: the LIDIA project and a first mockup. Mach Trans 9: 99–132

    Article  Google Scholar 

  • Cao Y, Zhang S, Huang T, Xu B (2004) Tone modeling for continuous Mandarin speech recognition. Int J Speech Technol 7:115–128

    Article  Google Scholar 

  • Cavar D, Kussner U, Tidhar D (2000) From off-line evaluation to on-line selection. In Wahlster (2000), pp 597–610

  • Cettollo M, Federico M, Bertoldi N, Cattoni R, Chen B (2005) A look inside the ITC-IRST SMT system. In: The Tenth Machine Translation Summit, Phuket, Thailand, pp 451–457

  • Chen SF, Goodman J (1996) An empirical study of smoothing techniques for language modeling. In: 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA, pp 310–318

  • Crego JM, de Gispert A, Mariño JB (2005) The TALP Ngram-based SMT system for IWSLT’05. In: Proceedings of the international workshop on spoken language translation (IWSLT), Pittsburgh, PA, pp 191–198

  • Ding G, Xu B (2004) Exploring high-performance speech recognition in noisy environments using high-order Taylor series expansion. In: Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP), Jeju Island, Korea, pp 149–152

  • Eck M, Hori C (2005) Overview of the IWSLT 2005 evaluation campaign. In: Proceedings of the International Workshop Spoken Language Translation (IWSLT), Pittsburgh, PA, pp 11–32

  • Foster P, Schalk T (1993) Speech recognition: the complete practical reference guide. CMP Books, Gilroy, CA

    Google Scholar 

  • Frederking R, Nirenburg S (1994) Three heads are better than one. In: 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp 95–100

  • Furuse O, Yamada S, Yamamoto K (1998) Splitting long or ill-formed input for robust spoken-language translation. In: COLING-ACL ’98: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada, pp 421–427

  • Gold B, Morgan N (2000) Speech and audio signal processing. John Wiley, New York, NY

    Google Scholar 

  • Hogan C, Frederking RE (1998) An evaluation of the multi-engine MT architecture. In Farwell D, Gerber L, Hovy E (eds) Machine translation and the information soup: third conference of the Association for Machine Translation in the Americas, AMTA’98, Langhorne, PA, USA. Springer, Berlin, Germany, pp 113–123

  • Hutchins J, Hartmann W, Ito E (2004) Compendium of translation software: directory of commercial machine translation systems and computer-aided translation tools, 8th ed. European Association for Machine Translation, Genève, Switzerland; available online at http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-8.pdf.

  • Ikeda T, Ando S, Satoh K, Okumura A, Watanabe T (2002) Automatic interpretation system integrating free-style sentence translation and parallel text based translation. In: Proceedings of the workshop on Speech-to-Speech Translation: Algorithms and Systems, Philadelphia, PA, pp 85–92

  • Juang BH (1998) The past, present and future of speech processing. IEEE Signal Process Mag, May 1998, pp 24–48

  • Junqua J-C, van Noord G (eds) (2001) Robustness in language and speech technology. Kluwer Academic Publishers, Dordrecht Netherlands

    Google Scholar 

  • Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  • Kitano H (1994) Speech-to-speech translation: a massively parallel memory-based approach. Kluwer Academic Publishers, Boston, MA

    Google Scholar 

  • Kronenberg S, Kummert F (1999) Soft unification: towards robust parsing of spontaneous speech. In: IASTED International Conference on Artificial Intelligence and Soft Computing, Honolulu, HI, pp 381–385

  • Lavie A, Levin L, Woszczyna M, Gates D, Gavalda M, Koll D, Waibel A (1999) The JANUS-III translation system: speech-to-speech translation in multiple domains. In: Proceedings of the C-STAR II Workshop, Schwetzingen, Germany, pp 3–25

  • Lazzari G (2000a) Speaker-language identification and speech translation. In: Hovy E, Ide N, Frederking R, Mariani J, Zampolli A, (eds) Multilingual information management: current levels and future abilities. Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy, pp 143–166

  • Lazzari G (2000b) Spoken translation: challenges and opportunities. In: International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China, pp 430–435

  • Lazzari G, Waibel A, Zong C (2004) Worldwide ongoing activities on multilingual speech to speech translation. In: Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP), Jeju Island, Korea, pp 373–376

  • Levin L, Lavie A, Woszczyna M, Gates D, Gavalda M, Koll D, Waibel A (2000) The Janus-III translation system: speech-to-speech translation in multiple domains. Mach Trans 15:3–25

    Article  Google Scholar 

  • Li W, Wong K-F (2003) The design of a statistical algorithm for resolving structural ambiguity in “V NP1 usde NP0”. Comput Intell 19:64–85

    Article  Google Scholar 

  • Liu D, Zong C (2003) Utterance segmentation using combined approach based on bi-directional N-gram and maximum entropy. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp 16–23

  • Matsubara S, Ogawa H, Toyama K, Inagaki Y (1999) Incremental spoken language translation based on a normal-form conversion of CFG. In: Proceedings of the 5th Natural Language Processing Pacific Rim Symposium (NLPRS), Beijing, China, pp 515–518

  • Mitamura T, Nyberg E, Torrejon E, Igo R (1999) Multiple strategies for automatic disambiguation in technical translation. In: Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 99), Chester, England, pp 218–227

  • Nakov P, Hearst M (2005) Using the web as an implicit training set: application to structural ambiguity resolution. In: Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, BC, pp 835–842

  • Nomoto T (2003) Predictive models of performance in multi-engine machine translation. In: MT Summit IX: Proceedings of the Ninth Machine Translation Summit, New Orleans, LA, pp 269–276

  • Nomoto T (2004) Multi-engine machine translation with voted language model. In: 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp 494–501

  • Oviatt S, Cohen P (1991) Discourse structure and performance efficiency in interactive and non-interactive spoken modalities. Comp Speech Language 5:297–326

    Article  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318

  • Potamianos G, Jelinek F (1998) A study of N-Gram and decision tree letter languages modeling. Speech Commun 24:171–192

    Article  Google Scholar 

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286

    Article  Google Scholar 

  • Rabiner L, Juang B-H (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood Cliffs. NJ

    Google Scholar 

  • Rayner M, Carter D, Bouillon P, Digalakis V, Wirén M (2000) The spoken language translator. Cambridge University Press, Cambridge, UK

    Google Scholar 

  • Reichert J, Waibel A (1994) The ISL EDTRL system. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Kyoto, Japan, pp 61–64

  • Ren F (1999) Super-function based machine translation. J Chin Language Comput (Commun COLIPS) 9:83–100

    Google Scholar 

  • Ren F, Li S (2000) Dialogue machine translation based upon parallel translation engines and face image processing. J Inform 37:521–531

    Google Scholar 

  • Seligman M (1997a) Interactive real-time translation via the Internet. In: Working notes, natural language Processing for the World Wide Web (AAAI-97 Spring Symposium), Stanford, CA, pp 142–148

  • Seligman M (1997b) Six issues in speech translation. In: Spoken Language Translation: Proceedings of a Workshop sponsored by the Association for Computational Linguistics and by the European network in Language and Speech (ELSNET), Madrid, Spain, pp 83–89

  • Seligman M (2000) Nine issues in speech translation. Mach Trans 15:149–185

    Article  Google Scholar 

  • Sugaya F, Takezawa T, Yokoo A, Yamamoto S (1999) End-to-end evaluation in ATR-MATRIX: speech translation system between English and Japanese. In: Proceedings of EuroSpeech’99, Budapest, Hungary, pp 2431–2434

  • Sumita E, Yamada S, Yamamoto K (1999) Solutions to problems inherent in spoken-language translation: the ATR-MATRIX approach. In: Machine Translation Summit VII ’99: MT in the great translation era, Singapore, pp 229–235

  • Tomabechi H, Saito H, Tomita M (1989) SpeechTrans: an experimental real-time speech-to-speech translation. In: Proceedings of the 1989 Spring Symposium of the American Association for Artificial Intelligence, Stanford, CA

  • Wahlster W (ed) (2000) Verbmobil: foundations of speech-to-speech translation. Springer, Berlin, Germany

    Google Scholar 

  • Waibel A (1996) Interactive translation of conversational speech. Computer 29(7):41–48

    Article  Google Scholar 

  • Waibel A, Jain AN, McNair AE, Saito H, Hauptmann A, Tebelskis J (1991) JANUS: a speech-to-speech translation system using connectionist and symbolic processing strategies. In: Proceedings of the 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Canada, pp 793–796

  • Wakita Y, Kawai J, Iida H (1997) Correct parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation. In: Spoken Language Translation: Proceedings of a Workshop sponsored by the Association for Computational Linguistics and by the European network in Language and Speech (ELSNET), Madrid, Spain, pp 24–29

  • Wang Y-Y (1999) A robust parser for spoken language understanding. In: Sixth European Conference on Speech Communication and Technology (EUROSPEECH’99), Budapest, Hungary, pp 2055–2058

  • Xie G (2004) [Research on statistical approaches to spoken language parsing], PhD dissertation, Institute of Automation, Chinese Academy of Sciences changed from 378 Beijing, China

  • Xie G, Zong C, Xu B (2002) Chinese spoken language analyzing based on combination of statistical and rule methods. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP-2002), Denver, CO, pp 613–616

  • Yamamoto K, Shirai S, Sakamoto M, Zhang Y (2001) Sandglass: twin paraphrasing spoken language translation. In: Proceedings of the 19th International Conference on Computer Processing of Oriental Languages (ICCPOL-2001), Taichung, Taiwan, pp 154–159

  • Zens R, Bender O, Hasan S, Khadivi S, Matusov E, Xu J, Zhang Y, Ney H (2005) The RWTH phrase-based statistical machine translation system. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT), Pittsburgh, PA, pp 155–162

  • Zong C, Huang T, Xu B (1999) changed from 377 [Technical analysis of automatic spoken language translation systems]. changed from 378 J Chin Inform Process 13(2):55–65

  • Zong C, Wakita Y, Xu B, Matsui K, Chen Z (2000a) Japanese-to-Chinese spoken language translation based on the simple expression. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP-2000), Beijing, China, pp 418–421

  • Zong C, Huang T, Xu B (2000b) An improved template-based approach to spoken language translation. In: Proceedings of the International Conference on Spoken Language Processing (ICSLP-2000), Beijing, China, pp 440–443

  • Zong C, Xu B, Huang T (2002) Interactive Chinese-to-English speech translation based on dialogue management. In: ACL 2002 Workshop on Speech-to-Speech Translation: Algorithms and Systems, Philadelphia, Pennsylvania, pp 61–68

  • Zuo Y, Zhou Y, Zong C (2004) Multi-engine based Chinese-to-English translation system. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 73–77

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chengqing Zong.

Additional information

All trademarks are hereby acknowledged. All URLs last accessed between 6th and 25th January 2006.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zong, C., Seligman, M. Toward Practical Spoken Language Translation. Machine Translation 19, 113–137 (2005). https://doi.org/10.1007/s10590-006-9000-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-006-9000-z

Keywords

Navigation