research-article

Web resources for language modeling in conversational speech recognition

Authors:
Ivan Bulyko

BBN Technologies, Cambridge, MA

BBN Technologies, Cambridge, MA
View Profile

,
Mari Ostendorf

University of Washington

University of Washington
View Profile

,
Manhung Siu

Hong Kong University of Science and Technology

Hong Kong University of Science and Technology
View Profile

,
Tim Ng

Hong Kong University of Science and Technology

Hong Kong University of Science and Technology
View Profile

,
Andreas Stolcke

SRI International and the International Computer Science Institute

SRI International and the International Computer Science Institute
View Profile

,
Özgür Çetin

International Computer Science Institute

International Computer Science Institute
View Profile

ACM Transactions on Speech and Language Processing Volume 5 Issue 1Article No.: 1pp 1–25https://doi.org/10.1145/1322391.1322392

Published:12 December 2007Publication History

ACM Transactions on Speech and Language Processing

Abstract

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

References

Akbacak, M., Gao, Y., Gu, L., and Kuo, H.-K. 2005. Rapid transition to new spoken dialog domains: Language model training using knowledge from previous domain applications and web text resources. In Proceedings of Interspeech. 1873--1876.Google Scholar
Banko, M. and Brill, E. 2003. Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing. In Proceedings of the Conference on Human Language Technology. 253--257. Google ScholarDigital Library
Bellegarda, J. 1998. Exploiting both local and global constraints for multispan statistical language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 677--680.Google Scholar
Berger, A. and Miller, R. 1998. Just-in-time language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 705--708.Google Scholar
Bessling, S. and Meier, H. 1995. Language model speaker adaptation. In Proceedings of the Eurospeech. 1755--1758.Google Scholar
Biber, D. 1988. Variation Across Speech and Writing. Cambridge University Press.Google Scholar
Biber, D. 1993. Using register-diversified corpora for general language studies. Computat. Linguis. 19, 2, 219--242. Google ScholarDigital Library
Boulis, C. 2005. Topic learning in text and conversational speech. Ph.D. thesis, University of Washington. Google ScholarDigital Library
Bulyko, I., Ostendorf, M., and Stolcke, A. 2003. Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the HLT/NAACL. 7--9. Google ScholarDigital Library
Çetin, O. and Stolcke, A. 2005. Language modeling in the ICSI-SRI Spring 2005 Meeting speech recognition evaluation system. Tech. rep. tr-05-06, International Computer Science Institute.Google Scholar
Chen, S. and Goodman, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359--394.Google ScholarDigital Library
Cieri, C., Miller, D., and Walker, K. 2003. From Switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proceedings of Eurospeech. 1597--1600.Google Scholar
Clarkson, P. and Robinson, A. 1997. Language model adaptation using mixtures and an exponentially decaying cache. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 799--802. Google ScholarDigital Library
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statis. Soc. Series B 39, 1, 1--38.Google Scholar
Duh, K. and Kirchhoff, K. 2005. Pos tagging of dialectal Arabic: A minimally supervised approach. In Proceedings of the Association for Computational Linguistics (ACL).Google Scholar
Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 1, 249--252.Google Scholar
Evermann, G., Chan, H., Gales, M., Jia, B., Liu, X., Mrva, D., Sim, K., Wang, L., Woodland, P., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS system using more than 2000 hours of data. In Proceedings of the NIST RT-04F Rich Transcription Workshop.Google Scholar
Gao, Y., Gu, L., and Kuo, H.-K. 2005. Portability challenges in developing interactive dialogue systems. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. V, 1017--1020.Google Scholar
Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. L. Lee and D. Harman, Eds. 167--202.Google Scholar
Godfrey, J., Holliman, E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 517--520.Google ScholarCross Ref
Goodman, J. 2001. A bit of progress in language modeling. Comput. Speech Lang. 15, 4, 403--434.Google ScholarDigital Library
Hain, T., Burget, L., Dines, J., McCowan, I., Karafiat, M., Lincoln, M., Moore, D., Garau, G., Wan, V., Ordelman, R., and Renals, S. 2005. The development of the AMI system for the transcription of speech in meetings. In Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Google ScholarDigital Library
Hwang, M., Lei, X., Ng, T., Ostendorf, M., Stolcke, A., Wang, W., Zheng, J., and Gadde, V. 2004. Porting Decipher from English to Mandarin. In Proceedings of the NIST RT-04F Rich Transcription Workshop.Google Scholar
Hwang, M.-Y. et al. 1996. Predicting unseen triphones with senones. IEEE Trans. Speech Audio Process. 4. 412--419.Google Scholar
Iyer, R. and Ostendorf, M. 1996. Modeling long range dependencies in languages. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 236--239.Google Scholar
Iyer, R. and Ostendorf, M. 1997. Transforming out-of-domain estimates to improve in-domain language models. In Proceedings of Eurospeech. vol. 4, 1975--1978.Google Scholar
Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Comput. Speech Lang. 13, 3, 267--282.Google ScholarDigital Library
Iyer, R., Ostendorf, M., and Meteer, M. 1997. Analyzing and predicting language model improvements. In IEEE Workshop on Speech Recognition and Understanding Proceedings. 254--261.Google Scholar
Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29, 3, 459--484. Google ScholarDigital Library
Kilgarriff, A. and Grefenstette, G. 2003. Introduction to the special issue on the web as a corpus. Computat. Linguist. 29, 3, 333--348. Google ScholarDigital Library
Klakow, D. 2000. Selecting articles from the language model training corpus. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. III, 1695--1698.Google ScholarCross Ref
Lamel, L., Adda, G., Bilinski, E., and Gauvain, J. L. 2005. Transcribing lectures and seminars. In Proceedings of Interspeech. 1657--1660.Google Scholar
Lapata, M. and Keller, F. 2005. Unsupervised web-based models for natural language processing. ACM Trans. Speech Lang. Process. 1, 2, 1--31. Google ScholarDigital Library
Lee, Y.-B. and Myaeng, S. 2002. Text genre classification with genre-revealing and subject-revealing features. In Proceedings of SIGIR. 145--150. Google ScholarDigital Library
Liu, F.-H., Picheny, M., Srinivasa, P., Mankowski, M., and Chen, J. 1996. Speech recognition on Mandarin CallHome: A large-vocabulary conversational and telephone speech corpus. In Proceedings of the International Conference on Acovstics, Speech and Signal Processing (ICASSP). vol. I, 157--160. Google ScholarDigital Library
Mahajan, M., Beeferman, D., and Huang, D. 1999. Improved topic-dependent language modeling using information retrieval techniques. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol., I, 541--544. Google ScholarDigital Library
Martin, S., Liermann, J., and Ney, H. 1997. Adaptive topic-dependent language modeling using word-based varigrams. In Proceedings of Eurospeech. vol. 3. 3, 1447--1450.Google Scholar
Morgan, N., Baron, D., Bhagat, S., Carvey, H., Dhillon, R., Edwards, J., Gelbart, D., Janin, A., Krupski, A., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. 2003. Meetings about meetings: Research at ICSI on speech in multiparty conversations. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. 4, 740--743.Google Scholar
Ng, T., Ostendorf, M., Hwang, M.-Y., Siu, M., Bulyko, I., and Lei, X. 2005. Web-data augmented language models for Mandarin conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). 89--593.Google Scholar
Ratnaparkhi, A. 1996. A maximum entropy part-of-speech tagger. In Proceedings of Empirical Methods in Natural Language Processing Conference. 133--141.Google Scholar
Ries, K. 1997. A class based approach to domain adaptation and constraint integration for empirical m-gram models. In Proceedings of Eurospeech. 4, 1983--1986.Google Scholar
Rudnicky, A. 1995. Language modeling with limited domain data. In Proceedings of ARPA Spoken Language Technology Workshop. 66--69.Google Scholar
Sarikaya, R., Gravano, A., and Gao, Y. 2005. Rapid language model development using external resources for new spoken dialog domains. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. I, 573--576.Google Scholar
Scheytt, P., Geutner, P., and Waibel, A. 1998. Serbo-Croatian LVCSR on the dictation and broadcast news domain. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 2, 897--900.Google Scholar
Schwarm, S., Bulyko, I., and Ostendorf, M. 2004. Adaptive language modeling with varied sources to cover new vocabulary items. IEEE Trans. Speech Audio 12, 3, 334--342.Google ScholarCross Ref
Sethy, A., Georgiou, P., and Narayanan, S. 2005. Building topic-specific language models from webdata using competitive models. In Proceedings of Interspeech. 1293--1296.Google Scholar
Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorf, M., and Richards, C. 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3, 287--333.Google ScholarDigital Library
Stolcke, A. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. 270--274.Google Scholar
Stolcke, A. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.Google Scholar
Stolcke, A., Anguera, X., Boakye, K., Janin, A., Mandal, A., Peskin, B., Wooters, C., and Zheng, J. 2005. Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system. In Proceedings of NIST MLMI Meeting Recognition Workshop. Google ScholarDigital Library
Stolcke, A. et al. 2003. Speech-to-text research at SRI-ICSI-UW. NIST RT-03 Workshop.Google Scholar
Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of Eurospeech. 245--248.Google Scholar
Wang, W., Stolcke, A., and Harper, M. 2004. The use of a linguistically motivated language model in conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 261--264.Google Scholar
Woodland, P. C. and Young, S. J. 1993. The HTK tied-state continuous speech recogniser. In Proceedings of Eurospeech. vol. 3, 2207--2210.Google Scholar
Xu, P. and Mangu, L. 2005. Using random forest language models in the IBM RT-04 CTS system. In Proceedings of Interspeech. 741--744.Google Scholar
Yang, Y. and Pedersen, J. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning. 412--420. Google ScholarDigital Library
Zhu, Q., Stolcke, A., Chen, B., and Morgan, N. 2005. Using mlp features in SRI's conversational speech recognition system. In Proceedings of Interspeech. 2141--2144.Google Scholar
Zhu, X. and Rosenfeld, R. 2001. Improving trigram language modeling with the World Wide Web. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). I:533--536.Google Scholar

Index Terms

Web resources for language modeling in conversational speech recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech is the most natural way of human communication and in order to achieve convenient and efficient human-computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally ...
Read More
A corpus of read and conversational Austrian German

First large scale speech database for Austrian German.It contains read and conversational speech of 38 speakers.Annotations at the orthographic, segmental and prosodic level.Our analysis demonstrates the highly casual speaking style. This paper presents ...
Read More
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Abstract
This article presents the research work on improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. The speech recognition system is built using a deep neural network–...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Speech and Language Processing Volume 5, Issue 1
December 2007
80 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/1322391
Issue’s Table of Contents

Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 December 2007
- Accepted: 1 August 2007
- Received: 1 November 2005
Published in tslp Volume 5, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Conversational speech
Web data
language modeling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,281
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Web resources for language modeling in conversational speech recognition

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

A corpus of read and conversational Austrian German

Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Web resources for language modeling in conversational speech recognition

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

A corpus of read and conversational Austrian German

Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media