skip to main content
research-article

Web resources for language modeling in conversational speech recognition

Published:12 December 2007Publication History
Skip Abstract Section

Abstract

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

References

  1. Akbacak, M., Gao, Y., Gu, L., and Kuo, H.-K. 2005. Rapid transition to new spoken dialog domains: Language model training using knowledge from previous domain applications and web text resources. In Proceedings of Interspeech. 1873--1876.Google ScholarGoogle Scholar
  2. Banko, M. and Brill, E. 2003. Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing. In Proceedings of the Conference on Human Language Technology. 253--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bellegarda, J. 1998. Exploiting both local and global constraints for multispan statistical language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 677--680.Google ScholarGoogle Scholar
  4. Berger, A. and Miller, R. 1998. Just-in-time language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 705--708.Google ScholarGoogle Scholar
  5. Bessling, S. and Meier, H. 1995. Language model speaker adaptation. In Proceedings of the Eurospeech. 1755--1758.Google ScholarGoogle Scholar
  6. Biber, D. 1988. Variation Across Speech and Writing. Cambridge University Press.Google ScholarGoogle Scholar
  7. Biber, D. 1993. Using register-diversified corpora for general language studies. Computat. Linguis. 19, 2, 219--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Boulis, C. 2005. Topic learning in text and conversational speech. Ph.D. thesis, University of Washington. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bulyko, I., Ostendorf, M., and Stolcke, A. 2003. Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the HLT/NAACL. 7--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Çetin, O. and Stolcke, A. 2005. Language modeling in the ICSI-SRI Spring 2005 Meeting speech recognition evaluation system. Tech. rep. tr-05-06, International Computer Science Institute.Google ScholarGoogle Scholar
  11. Chen, S. and Goodman, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cieri, C., Miller, D., and Walker, K. 2003. From Switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proceedings of Eurospeech. 1597--1600.Google ScholarGoogle Scholar
  13. Clarkson, P. and Robinson, A. 1997. Language model adaptation using mixtures and an exponentially decaying cache. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 799--802. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statis. Soc. Series B 39, 1, 1--38.Google ScholarGoogle Scholar
  15. Duh, K. and Kirchhoff, K. 2005. Pos tagging of dialectal Arabic: A minimally supervised approach. In Proceedings of the Association for Computational Linguistics (ACL).Google ScholarGoogle Scholar
  16. Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 1, 249--252.Google ScholarGoogle Scholar
  17. Evermann, G., Chan, H., Gales, M., Jia, B., Liu, X., Mrva, D., Sim, K., Wang, L., Woodland, P., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS system using more than 2000 hours of data. In Proceedings of the NIST RT-04F Rich Transcription Workshop.Google ScholarGoogle Scholar
  18. Gao, Y., Gu, L., and Kuo, H.-K. 2005. Portability challenges in developing interactive dialogue systems. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. V, 1017--1020.Google ScholarGoogle Scholar
  19. Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. L. Lee and D. Harman, Eds. 167--202.Google ScholarGoogle Scholar
  20. Godfrey, J., Holliman, E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 517--520.Google ScholarGoogle ScholarCross RefCross Ref
  21. Goodman, J. 2001. A bit of progress in language modeling. Comput. Speech Lang. 15, 4, 403--434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hain, T., Burget, L., Dines, J., McCowan, I., Karafiat, M., Lincoln, M., Moore, D., Garau, G., Wan, V., Ordelman, R., and Renals, S. 2005. The development of the AMI system for the transcription of speech in meetings. In Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hwang, M., Lei, X., Ng, T., Ostendorf, M., Stolcke, A., Wang, W., Zheng, J., and Gadde, V. 2004. Porting Decipher from English to Mandarin. In Proceedings of the NIST RT-04F Rich Transcription Workshop.Google ScholarGoogle Scholar
  24. Hwang, M.-Y. et al. 1996. Predicting unseen triphones with senones. IEEE Trans. Speech Audio Process. 4. 412--419.Google ScholarGoogle Scholar
  25. Iyer, R. and Ostendorf, M. 1996. Modeling long range dependencies in languages. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 236--239.Google ScholarGoogle Scholar
  26. Iyer, R. and Ostendorf, M. 1997. Transforming out-of-domain estimates to improve in-domain language models. In Proceedings of Eurospeech. vol. 4, 1975--1978.Google ScholarGoogle Scholar
  27. Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Comput. Speech Lang. 13, 3, 267--282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Iyer, R., Ostendorf, M., and Meteer, M. 1997. Analyzing and predicting language model improvements. In IEEE Workshop on Speech Recognition and Understanding Proceedings. 254--261.Google ScholarGoogle Scholar
  29. Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29, 3, 459--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kilgarriff, A. and Grefenstette, G. 2003. Introduction to the special issue on the web as a corpus. Computat. Linguist. 29, 3, 333--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Klakow, D. 2000. Selecting articles from the language model training corpus. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. III, 1695--1698.Google ScholarGoogle ScholarCross RefCross Ref
  32. Lamel, L., Adda, G., Bilinski, E., and Gauvain, J. L. 2005. Transcribing lectures and seminars. In Proceedings of Interspeech. 1657--1660.Google ScholarGoogle Scholar
  33. Lapata, M. and Keller, F. 2005. Unsupervised web-based models for natural language processing. ACM Trans. Speech Lang. Process. 1, 2, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lee, Y.-B. and Myaeng, S. 2002. Text genre classification with genre-revealing and subject-revealing features. In Proceedings of SIGIR. 145--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Liu, F.-H., Picheny, M., Srinivasa, P., Mankowski, M., and Chen, J. 1996. Speech recognition on Mandarin CallHome: A large-vocabulary conversational and telephone speech corpus. In Proceedings of the International Conference on Acovstics, Speech and Signal Processing (ICASSP). vol. I, 157--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mahajan, M., Beeferman, D., and Huang, D. 1999. Improved topic-dependent language modeling using information retrieval techniques. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol., I, 541--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Martin, S., Liermann, J., and Ney, H. 1997. Adaptive topic-dependent language modeling using word-based varigrams. In Proceedings of Eurospeech. vol. 3. 3, 1447--1450.Google ScholarGoogle Scholar
  38. Morgan, N., Baron, D., Bhagat, S., Carvey, H., Dhillon, R., Edwards, J., Gelbart, D., Janin, A., Krupski, A., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. 2003. Meetings about meetings: Research at ICSI on speech in multiparty conversations. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. 4, 740--743.Google ScholarGoogle Scholar
  39. Ng, T., Ostendorf, M., Hwang, M.-Y., Siu, M., Bulyko, I., and Lei, X. 2005. Web-data augmented language models for Mandarin conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). 89--593.Google ScholarGoogle Scholar
  40. Ratnaparkhi, A. 1996. A maximum entropy part-of-speech tagger. In Proceedings of Empirical Methods in Natural Language Processing Conference. 133--141.Google ScholarGoogle Scholar
  41. Ries, K. 1997. A class based approach to domain adaptation and constraint integration for empirical m-gram models. In Proceedings of Eurospeech. 4, 1983--1986.Google ScholarGoogle Scholar
  42. Rudnicky, A. 1995. Language modeling with limited domain data. In Proceedings of ARPA Spoken Language Technology Workshop. 66--69.Google ScholarGoogle Scholar
  43. Sarikaya, R., Gravano, A., and Gao, Y. 2005. Rapid language model development using external resources for new spoken dialog domains. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. I, 573--576.Google ScholarGoogle Scholar
  44. Scheytt, P., Geutner, P., and Waibel, A. 1998. Serbo-Croatian LVCSR on the dictation and broadcast news domain. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 2, 897--900.Google ScholarGoogle Scholar
  45. Schwarm, S., Bulyko, I., and Ostendorf, M. 2004. Adaptive language modeling with varied sources to cover new vocabulary items. IEEE Trans. Speech Audio 12, 3, 334--342.Google ScholarGoogle ScholarCross RefCross Ref
  46. Sethy, A., Georgiou, P., and Narayanan, S. 2005. Building topic-specific language models from webdata using competitive models. In Proceedings of Interspeech. 1293--1296.Google ScholarGoogle Scholar
  47. Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorf, M., and Richards, C. 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3, 287--333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Stolcke, A. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. 270--274.Google ScholarGoogle Scholar
  49. Stolcke, A. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.Google ScholarGoogle Scholar
  50. Stolcke, A., Anguera, X., Boakye, K., Janin, A., Mandal, A., Peskin, B., Wooters, C., and Zheng, J. 2005. Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system. In Proceedings of NIST MLMI Meeting Recognition Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Stolcke, A. et al. 2003. Speech-to-text research at SRI-ICSI-UW. NIST RT-03 Workshop.Google ScholarGoogle Scholar
  52. Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of Eurospeech. 245--248.Google ScholarGoogle Scholar
  53. Wang, W., Stolcke, A., and Harper, M. 2004. The use of a linguistically motivated language model in conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 261--264.Google ScholarGoogle Scholar
  54. Woodland, P. C. and Young, S. J. 1993. The HTK tied-state continuous speech recogniser. In Proceedings of Eurospeech. vol. 3, 2207--2210.Google ScholarGoogle Scholar
  55. Xu, P. and Mangu, L. 2005. Using random forest language models in the IBM RT-04 CTS system. In Proceedings of Interspeech. 741--744.Google ScholarGoogle Scholar
  56. Yang, Y. and Pedersen, J. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning. 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhu, Q., Stolcke, A., Chen, B., and Morgan, N. 2005. Using mlp features in SRI's conversational speech recognition system. In Proceedings of Interspeech. 2141--2144.Google ScholarGoogle Scholar
  58. Zhu, X. and Rosenfeld, R. 2001. Improving trigram language modeling with the World Wide Web. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). I:533--536.Google ScholarGoogle Scholar

Index Terms

  1. Web resources for language modeling in conversational speech recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Speech and Language Processing
      ACM Transactions on Speech and Language Processing   Volume 5, Issue 1
      December 2007
      80 pages
      ISSN:1550-4875
      EISSN:1550-4883
      DOI:10.1145/1322391
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 December 2007
      • Accepted: 1 August 2007
      • Received: 1 November 2005
      Published in tslp Volume 5, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader