skip to main content
research-article

Distributed speech translation technologies for multiparty multilingual communication

Published: 02 August 2012 Publication History

Abstract

Developing a multilingual speech translation system requires efforts in constructing automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS) components for all possible source and target languages. If the numerous ASR, MT, and TTS systems for different language pairs developed independently in different parts of the world could be connected, multilingual speech translation systems for a multitude of language pairs could be achieved. Yet, there is currently no common, flexible framework that can provide an entire speech translation process by bringing together heterogeneous speech translation components. In this article we therefore propose a distributed architecture framework for multilingual speech translation in which all speech translation components are provided on distributed servers and cooperate over a network. This framework can facilitate the connection of different components and functions. To show the overall mechanism, we first present our state-of-the-art technologies for multilingual ASR, MT, and TTS components, and then describe how to combine those systems into the proposed network-based framework. The client applications are implemented on a handheld mobile terminal device, and all data exchanges among client users and spoken language technology servers are managed through a Web protocol. To support multiparty communication, an additional communication server is provided for simultaneously distributing the speech translation results from one user to multiple users. Field testing shows that the system is capable of realizing multiparty multilingual speech translation for real-time and location-independent communication.

References

[1]
Arulampalam, M. S., Maskell, S., Gordon, N., and Clapp, T. 2002. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50, 2, 174--188.
[2]
Asahara, M. and Matsumoto, Y. 2000. Extended models and tools for high performance part-of-speech tagger. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). 21--27.
[3]
Brown, P., Della-Pietra, S., Della-Pietra, V., and Mercer, R. 1993. The mathematics of statistical machine translation: Parameter estimation. Comput. Ling. 19, 2, 263--311.
[4]
CCITT. 1984. Absolute Category Rating (ACR) Method for Subjective Testing of Digital Processors. Red Book.
[5]
Finch, A. and Sumita, E. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Statistical Machine Translation Workshop (WMT). 208--215.
[6]
Foster, G. and Kuhn, R. 2007. Mixture model adaptation for SMT. In Proceedings of the Statistical Machine Translation Workshop (WMT). 128--135.
[7]
Fujimoto, M. and Nakamura, S. 2006. A non-stationary noise suppression method based on particle filtering and Polyak averaging. IEICE Trans. Inform. Syst. J89-ED, 3, 922--930.
[8]
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 1, 97--109.
[9]
Hu, X., Isotani, R., and Nakamura, S. 2009. Construction of Chinese segmented and POS-tagged conversational corpora and their evaluations on spontaneous speech recognitions. In Proceedings of the 7th Workshop on Asian Language Resource, Annual Meeting of Association for Computational Linguistics (ACL). 65--70.
[10]
Jitsuhiro, T., Matsui, T., and Nakamura, S. 2004. Automatic generation of non-uniform HMM topologies based on the MDL criterion. IEICE Trans. Inform. Syst. E87-D, 8, 2121--2129.
[11]
Kawai, H., Toda, T., Ni, J., Tsuzaki, M., and Tokuda, K. 2004. XIMERA: A new TTS from ATR based on corpus-based technologies. In Proceedings of the ISCA Speech Synthesis Workshop (SSW5). 179--184.
[12]
Kikui, G., Sumita, E., Takezawa, T., and Yamamoto, S. 2003. Creating corpora for speech-to-speech translation. In Proceedings of EUROSPEECH. 381--384.
[13]
Kikui, G., Takezawa, T., Mizushima, M., Yamamoto, S., Sasaki, Y., Kawai, H., and Nakamura, S. 2005. Monitor experiments of ATR speech-to-speech translation system. In Proceedings of the Autumn Meeting of the Acoustical Society of Japan (ASJ). 19--20.
[14]
Kikui, G., Yamamoto, S., Takezawa, T., and Sumita, E. 2006. Comparative study on corpora for speech translation. IEEE Trans. Audio Speech Lang. Process. 14, 5, 1674--1682.
[15]
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference. 127--133.
[16]
Lo, W. K. and Soong, F. K. 2005. Generalized posterior probability for minimum error verification of recognized sentences. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 85--88.
[17]
Och, F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 160--167.
[18]
Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 295--302.
[19]
Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Ling. 29, 1, 19--51.
[20]
Ostendorf, M. and Singer, H. 1997. HMM topology design using maximum likelihood successive state splitting. Comput. Speech Lang. 11, 17--41.
[21]
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Meeting of the Association of Computational Linguistics (ACL). 311--318.
[22]
Paul, M., Okuma, H., Yamamoto, H., Sumita, E., Matsuda, S., Shimizu, T., and Nakamura, S. 2008. Multilingual mobile-phone translation services for world travelers. In Proceedings of the International Conference on Computational Linguistics, Workshop (COLING). Companion Volume.165--168.
[23]
Paul, M., Yamamoto, H., Sumita, E., and Nakamura, S. 2009. On the importance of the pivot language selection for statistical machine translation. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT). 221--224.
[24]
Sakti, S., Kimura, N., Paul, M., Hori, C., Sumita, E., Nakamura, S., Park, J., Wutiwiwachai, C., Xu, B., Riza, H., Arora, K., Luong, C., and Li, H. 2009. The Asian network-based speech-to-speech translation system. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 507--512.
[25]
Segura, J. C., Torre, A. D. L., Benitez, M. C., and Peinado, A. M. 2001. Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks. In Proceedings of EUROSPEECH. 221--224.
[26]
Soong, F. K., Loo, W. K., and Nakamura, S. 2004. Optimal acoustic and language model weight for minimizing word verification errors. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 441--444.
[27]
Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.
[28]
Takami, J. and Sagayama, S. 1992. A successive state splitting algorithm for efficient allophone modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 573--576.
[29]
Takezawa, T. and Kikui, G. 2003. Collecting machine-translation aided bilingual dialogues for corpus-based speech-to-speech translation. In Proceedings of EUROSPEECH. 2757--2760.
[30]
Takezawa, T. and Kikui, G. 2004. A comparative study on human communication behaviors and linguistic characteristics for speech-to-speech translation. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC). 1589--1592.
[31]
Toda, T., Kawai, H., and Tsuzaki, M. 2004. Optimizing sub-cost functions for segment selection based on perceptual evaluation in concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 657--660.
[32]
Tokuda, K., Kobayashi, T., and Imai, S. 1995. Speech parameter generation from HMM using dynamic features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 660--663.
[33]
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., and Kitamura, T. 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1215--1218.
[34]
Yamamoto, H., Isogai, S., and Sagisaka, Y. 2003. Multi-class composite N-gram language model. Speech Comm. 41, 369--379.
[35]
Yamamoto, H. and Sumita, E. 2007. Bilingual cluster-based model for statistical machine translation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNNL). 514--523.

Cited By

View all
  • (2021)Mobile Platform Speech Intelligent Recognition and Translation APP2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)10.1109/ICMTMA52658.2021.00137(595-598)Online publication date: Jan-2021
  • (2017)Listening while speaking: Speech chain by deep learning2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU.2017.8268950(301-308)Online publication date: Dec-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Speech and Language Processing
ACM Transactions on Speech and Language Processing   Volume 9, Issue 2
July 2012
58 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/2287710
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2012
Accepted: 01 March 2012
Revised: 01 February 2012
Received: 01 August 2011
Published in TSLP Volume 9, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Distributed architecture platforms
  2. machine translation
  3. multiparty multilingual communication
  4. speech recognition
  5. speech-to-speech translation
  6. text-to-speech

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Mobile Platform Speech Intelligent Recognition and Translation APP2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)10.1109/ICMTMA52658.2021.00137(595-598)Online publication date: Jan-2021
  • (2017)Listening while speaking: Speech chain by deep learning2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU.2017.8268950(301-308)Online publication date: Dec-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media