Skip to main content

Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach

  • Conference paper
Spoken Dialogue Systems for Ambient Environments (IWSDS 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6392))

Included in the following conference series:

  • 424 Accesses

Abstract

Previous approaches to spontaneous speech recognition address the multiple pronunciation problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence are not considered yet. In this paper we attempt to model the sequence-based pronunciation variation using a noisy-channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this preliminary study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy-channel approach will map from the phoneme to the word level. Our experiments use Switchboard as spontaneous speech corpus. The results show that the proposed method improves the word accuracy consistently over the conventional recognition system. The best system achieves up to 38.9% relative improvement to the baseline speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al-Onaizan, Y., Papineni, K.: Distortion models for statistical machine translation. In: Proc. ACL/COLING, pp. 529–536 (2006)

    Google Scholar 

  2. Bates, A., Osterndorf, M., Wright, R.: Symbolic phonetic features for modeling of pronunciation variation. Speech Communication 49, 83–97 (2007)

    Article  Google Scholar 

  3. Brown, P., Pietra, S., Pietra, V.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)

    Google Scholar 

  4. Chen, K., Hasegawa-Johnson, M.: Modeling pronunciation variation using artificial neural networks for English spontaneous speech. In: Proc. ICSLP, pp. 1461–1464 (2004)

    Google Scholar 

  5. Finch, A., Denoual, E., Okuma, H., Paul, M., Yamamoto, H., Yasuda, K., Zhang, R., Sumita, E.: The NICT/ATR speech translation system for IWSLT 2007. In: Proc. IWSLT, pp. 103–110 (2007)

    Google Scholar 

  6. Fosler-Lussier, E.: Contextual word and syllable pronunciation models. In: Proc. IEEE ASRU Workshop (1999)

    Google Scholar 

  7. Godfrey, J., Holliman, E., McDaniel, J.: SWITCHBOARD: Telephone speech corpus for research and development. In: Proc. ICSLP, pp. 24–27 (1996)

    Google Scholar 

  8. Jitsuhiro, T., Matsui, T., Nakamura, S.: Automatic generation of non-uniform HMM topologies based on the MDL criterion. IEICE Trans. Inf. Syst. E87-D (8) (2004)

    Google Scholar 

  9. King, S., Bartels, C., Bilmers, J.: Small vocabulary tasks from Switchboard 1. In: Proc. EUROSPEECH, pp. 3385–3388 (2005)

    Google Scholar 

  10. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proc. the Human Language Technology Conference, pp. 127–133 (2003)

    Google Scholar 

  11. Livescu, K., Glass, J.: Feature-based pronunciation modeling for speech recognition. In: Proc. HLT/NAACL (2004)

    Google Scholar 

  12. Och, F., Ney, H.: Discriminative training and maximum entropy models for statistical machine translation. In: Proc. ACL, pp. 295–302 (2002)

    Google Scholar 

  13. Och, F., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  14. Pallet, D.: A look at NISTS’s benchmark ASR tests: Past, present, future. In: Proc. ASRU, pp. 483–488 (2003)

    Google Scholar 

  15. Pallett, S., Fiscus, J., Fisher, M., Garofolo, J., Lund, B., Przybocki, M.: 1993 benchmark tests for the ARPA spoken language program. In: Proc. Spoken Language Technology Workshop (1994)

    Google Scholar 

  16. Paul, D.B., Baker, J.: The design for the Wall Street journal-based CSR corpus. In: Proc. ICSLP (1992)

    Google Scholar 

  17. Riley, M., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., McDonough, J., Nock, H., Saraclar, M., Wooters, C., Zavaliagkos, G.: Stochastic pronunciation modelling from handlabelled phonetic corpora. In: Proc. ETRW on Modeling Pronunciation Variation for Automatic Speech Recognition, pp. 109–116 (1998)

    Google Scholar 

  18. Sakti, S., Markov, S., Nakamura, S.: Probabilistic pronunciation variation model based on Bayesian networks for conversational speech recognition. In: Second International Symposium on Universal Communication (2008)

    Google Scholar 

  19. Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proc. ICSLP, pp. 901–904 (2002)

    Google Scholar 

  20. Lo, W.K., Soong, F.K.: Generalized posterior probability for minimum error verification of recognized sentences. In: Proc. ICASSP, pp. 85–88 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hofmann, H., Sakti, S., Isotani, R., Kawai, H., Nakamura, S., Minker, W. (2010). Sequence-Based Pronunciation Modeling Using a Noisy-Channel Approach. In: Lee, G.G., Mariani, J., Minker, W., Nakamura, S. (eds) Spoken Dialogue Systems for Ambient Environments. IWSDS 2010. Lecture Notes in Computer Science(), vol 6392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16202-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16202-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16201-5

  • Online ISBN: 978-3-642-16202-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics