Abstract
Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.
The 2009 evaluation systems are further developments of the 2008 evaluation systems which incorporate more training data and updated models. The speech recognition and machine translation models were, at leas in part, trained on the XC4000 high performance cluster. The speech recognition evaluation itself was also mainly executed on the XC4000.
In this paper we report on the newly developed system and how we utilized the XC4000 in order to train their models and to run the actual evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andreas Zollman, Ashish Venugopal, and Alex Waibel. Training and Evaluation Error Minimization Rules for Statistical Machine Translation. In Proc. of ACL 2005, Workshop on Data-drive Machine Translation and Beyond (WPT-05), Ann Arbor, MI, 2005.
A.W. Black and P.A. Taylor. The festival speech synthesis system: System documentation. Technical report, Human Communication Research Centre, University of Edinburgh, Edinburgh, Scotland, United Kingdom, 1997.
W.M. Fisher. A statistical text-to-phone function using ngrams and rules. In Proceedings the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, USA, December 1999. IEEE.
George Foster, Roland Kuhn, and Howard Johnson. Phrasetable Smoothing for Statistical Machine Translation. In Proc. of Empirical Methods in Natural Language Processing, Sydney, Australia, 2006.
M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Technical report, Cambridge University, Engineering Department, May 1997.
M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. Technical report, Cambridge University, Engineering Department, February 1998.
Christian Gollan, Maximilian Bisani, Stephan Kanthak, Ralf Schlüter, and Hermann Ney. Cross domain automatic transcription on the tc-star epps corpus. In Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, March 2005.
Qin Gao and Stephan Vogel. Parallel implementation of word alignment tool. In Proceedings of the ACL Workshops Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pages 49–57, Columbus, Ohio, June 2008. ACL.
Almut Silja Hildebrand and Stephan Vogel. Combination of machine translation systems via hypothesis selection from combined n-best lists. In MT at work: Proceedings of the 8th Conference of the AMTA, pages 254–261, Waikiki, Hawaii, October 2008.
Muntsin Kolss, Jan Niehues, Teresa Herrmann, and Alex Waibel. The Universität Karlsruhe Translation System for the EACL-WMT 2009. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.
Qin Jin and Tanja Schultz. Speaker segmentation and clustering in meetings. In Proceedings of the 8th International Conference on Spoken Language Processing (Interspeech 2004 — ICSLP), Jeju Island, Korea, October 2004. ISCA.
Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical Phrase-Based Translation. In HLT/NAACL 2003, 2003.
E. Leeuwis, M. Federico, and M. Cettolo. Language modeling and transcription of the ted corpus lectures. In International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, March 2003.
C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171–185, 1995.
Jan Niehues and Muntsin Kolss. A POS-Based Model for Long-Range Reorderings in SMT. In Fourth Workshop on Statistical Machine Translation (WMT 2009), Athens, Greece, 2009.
Jan Niehues and Stephan Vogel. Discriminative Word Alignment via Alignment Matrix Modeling. In Proc. of Third ACL Workshop on Statistical Machine Translation, Columbus, USA, 2008.
Franz J. Och. GIZA++: Training of statistical translation models. http://www.fjoch.com/GIZA++.html, 2000.
Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 160–167, Sapporo, Japan, July 2003. Association for Computational Linguistics.
D. Povey and P.C. Woodland. Improved discriminative training techniques for large vocabulary continuous speech recognition. In International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, May 2001.
Kay Rottman and Stephan Vogel. Word reordering in statistical machine translation with a pos-based distortion model. In TMI ’07, 2007.
Sebastian Stüker, Christian Fügen, Florian Kraft, and Matthias Wölfel. The isl 2007 English speech transcription system for European parliament speeches. In Proceedings of the 10th European Conference on Speech Communication and Technology (INTERSPEECH 2007), pages 2609–2612, Antwerp, Belgium, August 2007.
H. Soltau, F. Metze, C. Fügen, and A. Waibel. A one pass-decoder based on polymorphic linguistic context assignment. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU ’01), pages 214–217, Madonna di Campiglio Trento, Italy, December 2001.
A. Stolcke. SRILM – An Extensible Language Modeling Toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002), pages 901–904, Denver, CO, USA, 2002. ISCA.
A. Venkataraman and W. Wang. Techniques for effective vocabulary selection. Arxiv preprint cs/0306022, 2003.
M.C. Wölfel and J.W. McDonough. Minimum variance distortionless response spectralestimation, review and refinements. IEEE Signal Processing Magazine, 22(5):117–126, September 2005.
Puming Zhan and Martin Westphal. Speaker normalization based on frequency warping. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, April 1997.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stüker, S., Kilgour, K., Niehues, J. (2011). Quaero Speech-to-Text and Text Translation Evaluation Systems. In: Nagel, W., Kröner, D., Resch, M. (eds) High Performance Computing in Science and Engineering '10. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15748-6_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-15748-6_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15747-9
Online ISBN: 978-3-642-15748-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)