Quaero Speech-to-Text and Text Translation Evaluation Systems

Stüker, Sebastian; Kilgour, Kevin; Niehues, Jan

doi:10.1007/978-3-642-15748-6_38

Sebastian Stüker⁴,
Kevin Kilgour⁴ &
Jan Niehues⁵

1854 Accesses
3 Citations

Abstract

Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.

The 2009 evaluation systems are further developments of the 2008 evaluation systems which incorporate more training data and updated models. The speech recognition and machine translation models were, at leas in part, trained on the XC4000 high performance cluster. The speech recognition evaluation itself was also mainly executed on the XC4000.

In this paper we report on the newly developed system and how we utilized the XC4000 in order to train their models and to run the actual evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andreas Zollman, Ashish Venugopal, and Alex Waibel. Training and Evaluation Error Minimization Rules for Statistical Machine Translation. In Proc. of ACL 2005, Workshop on Data-drive Machine Translation and Beyond (WPT-05), Ann Arbor, MI, 2005.
Google Scholar
A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997.
Google Scholar
W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE.
Google Scholar
George Foster, Roland Kuhn, and Howard Johnson. Phrasetable Smoothing for Statistical Machine Translation. In Proc. of Empirical Methods in Natural Language Processing, Sydney, Australia, 2006.
Google Scholar
M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997.
Google Scholar
M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998.
Google Scholar
Christian Gollan, Maximilian Bisani, Stephan Kanthak, Ralf Schlüter, and Hermann Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005.
Google Scholar
Qin Gao and Stephan Vogel. Parallel implementation of word alignment tool. In Proceedings of the ACL Workshops Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pages 49–57, Columbus, Ohio, June 2008. ACL.
Google Scholar
Almut Silja Hildebrand and Stephan Vogel. Combination of machine translation systems via hypothesis selection from combined n-best lists. In MT at work: Proceedings of the 8th Conference of the AMTA, pages 254–261, Waikiki, Hawaii, October 2008.
Google Scholar
Muntsin Kolss, Jan Niehues, Teresa Herrmann, and Alex Waibel. The Universität Karlsruhe Translation System for the EACL-WMT 2009. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.
Google Scholar
Qin Jin and Tanja Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 — ICSLP), Jeju Island, Korea, October 2004. ISCA.
Google Scholar
Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-Based Translation. In HLT/NAACL 2003, 2003.
Google Scholar
E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the ted corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003.
Google Scholar
C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185, 1995.
Article Google Scholar
Jan Niehues and Muntsin Kolss. A POS-Based Model for Long-Range Reorderings in SMT. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.
Google Scholar
Jan Niehues and Stephan Vogel. Discriminative Word Alignment via Alignment Matrix Modeling. In Proc. of Third ACL Workshop on Statistical Machine Translation, Columbus, USA, 2008.
Google Scholar
Franz J. Och. GIZA++: Training of statistical translation models. http://www.fjoch.com/GIZA++.html, 2000.
Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July 2003. Association for Computational Linguistics.
Google Scholar
D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001.
Google Scholar
Kay Rottman and Stephan Vogel. Word reordering in statistical machine translation with a pos-based distortion model. In TMI ’07, 2007.
Google Scholar
Sebastian Stüker, Christian Fügen, Florian Kraft, and Matthias Wölfel. The isl 2007 English speech transcription system for European parliament speeches. In Proceedings of the 10th European Conference on Speech Communication and Technology (INTERSPEECH 2007), pages 2609–2612, Antwerp, Belgium, August 2007.
Google Scholar
H. Soltau, F. Metze, C. Fügen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001.
Google Scholar
A. Stolcke. SRILM – An Extensible Language Modeling Toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA.
Google Scholar
A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003.
Google Scholar
M.C. Wölfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005.
Article Google Scholar
Puming Zhan and Martin Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Research Group 3-01 ‘Multilingual Speech Recognition’, Karlsruhe Institute of Technology, Karlsruhe, Germany
Sebastian Stüker & Kevin Kilgour
Interactive Systems Laboratories, Karlsruhe Institute of Technology, Karlsruhe, Germany
Jan Niehues

Authors

Sebastian Stüker
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Kilgour
View author publications
You can also search for this author in PubMed Google Scholar
Jan Niehues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Stüker .

Editor information

Editors and Affiliations

Zentrum für Informationsdienste und, Hochleistungsrechnen (ZIH), TU Dresden, Dresden, 01062, Germany
Wolfgang E. Nagel
, Abt. Angewandte Mathematik, Universität Freiburg, Hermann-Herder-Str. 10, Freiburg, 79104, Germany
Dietmar B. Kröner
Stuttgart (HLRS), Universität Stuttgart, Höchstleistungsrechenzentrum, Nobelstraße 19, Stuttgart, 70569, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stüker, S., Kilgour, K., Niehues, J. (2011). Quaero Speech-to-Text and Text Translation Evaluation Systems. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering '10. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15748-6_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-15748-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15747-9
Online ISBN: 978-3-642-15748-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics