Skip to main content

Improving Voice Search Using Forward-Backward LVCSR System Combination

  • Chapter
The Sixth International Symposium on Neural Networks (ISNN 2009)

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 56))

Abstract

Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. This also helps to improve the spoken language understanding (SLU) performance. Experiment results show that our proposed method improves the performance of speech recognition by 5.7% relative CER reduction and increases the F1-measure of SLU by 1.5% absolute on our test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miller, D.: Speech-enabled Mobile Search Marches On. Speech Technology Magazine (2007)

    Google Scholar 

  2. Wang, Y., Yu, D., Ju, Y., Acero, A.: An Introduction to Voice Search. Signal Processing Magazine, IEEE 25(3), 28–38 (2008)

    Article  Google Scholar 

  3. Yu, D., Ju, Y., Wang, Y., Zweig, G., Acero, A.: Automated Directory Assistance System–from Theory to Practice. In: Proceedings of Interspeech (2007)

    Google Scholar 

  4. Rabiner, L., Juang, B.: Fundamentals of Speech Recognition, pp. 200–238. Prentice-Hall International Inc., Englewood Cliffs (1999)

    Google Scholar 

  5. Gao, Y., Ramabhadran, B., Chen, J., Erdogan, H., Picheny, M., Center, I., Heights, Y.: Innovative approaches for large vocabulary name recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1 (2001)

    Google Scholar 

  6. Austin, S., Schwartz, R., Placeway, P.: The Forward-backward Search Algorithm. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), pp. 697–700 (1991)

    Google Scholar 

  7. Povey, D., Woodland, P.: Minimum Phone Error and I-smoothing for Improved Discriminativetraining. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002) (2002)

    Google Scholar 

  8. Liu, C., Yan, Y.: Robust State Clustering Using Phonetic Decision Trees. Speech Communication 42(3), 391–408 (2004)

    Article  MathSciNet  Google Scholar 

  9. Stolcke, A.: SRILM-an Extensible Language Modeling Toolkit. In: Seventh International Conference on Spoken Language Processing (2002)

    Google Scholar 

  10. Shao, J., Li, T., Zhang, Q., Zhao, Q., Yan, Y.: A One-Pass Real-Time Decoder Using Memory-Efficient State Network. IEICE Transactions on Information and Systems 91(3), 529 (2008)

    Article  Google Scholar 

  11. Ratnaparkhi, A., et al.: A Maximum Entropy Model for Part-of-speech Tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133–142. Association for Computational Linguistics (1996)

    Google Scholar 

  12. Sinha, R., Gales, M., Kim, D., Liu, X., Sim, K., Woodland, P.: The CU-HTK Mandarin broadcast news transcription system. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006) (2006)

    Google Scholar 

  13. Hoffmeister, B., Plahl, C., Fritz, P., Heigold, G., Loof, J., Schluter, R., Ney, H.: Development of the 2007 RWTH Mandarin GALE LVCSR system. In: IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan (December 2007)

    Google Scholar 

  14. Ng, T., Zhang, B., Nguyen, K., Nguyen, L.: Progress in the BBN 2007 Mandarin Speech to Text system. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 1537–1540 (2008)

    Google Scholar 

  15. Schwenk, H., Gauvain, J.: Combining Multiple Speech Recognizers Using Voting and Language Model Information. In: Sixth International Conference on Spoken Language Processing, ISCA (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, T., Bao, C., Xu, W., Pan, J., Yan, Y. (2009). Improving Voice Search Using Forward-Backward LVCSR System Combination. In: Wang, H., Shen, Y., Huang, T., Zeng, Z. (eds) The Sixth International Symposium on Neural Networks (ISNN 2009). Advances in Intelligent and Soft Computing, vol 56. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01216-7_82

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01216-7_82

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01215-0

  • Online ISBN: 978-3-642-01216-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics