Skip to main content

Abstract

This paper addresses the problem of modeling Korean pronunciation variation as a sequential labeling task where tokens in the source language (phonemic symbols) are labeled with tokens in the target language (orthographic Korean transcription). This is done by utilizing conditional random fields (CRFs), which are undirected graphical models that maximize the posterior probabilities of the label target sequence given the input source sequence. In this study, the proposed CRFbased pronunciation variation model is applied to our Korean LVCSR after we perform the standard hidden Markov model (HMM)-based recognition of the phonemic syllable of the actual pronunciation (surface forms). The goal is then to output a sequence of Korean orthography given a sequence of phonemic syllable surface forms. Experimental results show that the proposed CRF model could help enhance our Korean large-vocabulary continuous speech recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Strik and C. Cucchiarini, “Modeling pronunciation variation for ASR: A survey of the literature,” Speech Communication, vol. 29, pp. 225–246, 1999.

    Article  Google Scholar 

  2. B. Kim, G.G. Lee, and J. Lee, “Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information,” ACM Transactions on Asian Language Information Processing, vol. 1, no. 1, pp. 6582, 2002.

    Google Scholar 

  3. J. Jeon, S. Wee, and M. Chung, “Generating pronunciation dictionary by analyzing phonological variations frequently found in spoken Korean,” in Proc. of International Conference on Speech Processing, 1997, pp. 519–524.

    Google Scholar 

  4. J. Jeon, S. Cha, M. Chung, and J. Park, “Automatic generation of Korean pronunciation variants by multistage applications of phonological rules,” in Proc. of ICSLP, Sydney, Australia, 1998, pp. 1943–1946.

    Google Scholar 

  5. J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. of ICML, Williamstown, MA, USA, 2001, pp. 282–289.

    Google Scholar 

  6. T. Kudo, K. Yamamoto, and Y. Matsumoto, “Applying conditional random fields to Japanese morphological analysis,” in Proc of EMNLP, 2004, pp. 230–237.

    Google Scholar 

  7. F. Sha and F. Pereira, “Shallow parsing with conditional random fields,” in Proc. of HLTNAACL, Edmonton, Canada, 2003, pp. 213–220.

    Google Scholar 

  8. J.R. Finkel and C.D. Manning, “Joint parsing and named entity recognition,” in Proc. of NAACL, Boulder, Colorado, USA, 2009, pp. 326–334.

    Google Scholar 

  9. J. Hammersley and P. Clifford, “Markov fields and finite graphs and lattices,” 1971.

    Google Scholar 

  10. T. Kudo, “CRF++: Yet another CRF toolkit,” http://crfpp.sourceforge.net/, 2005.

  11. M. Kim, Y.R. Oh, and H.K. Kim, “Non-native pronunciation variation modeling using an indirect data driven method,” in Proc. of ASRU, Kyoto, Japan, 2007, pp. 231–236.

    Google Scholar 

  12. T. Jitsuhiro, T. Matsui, and S. Nakamura, “Automatic generation of non-uniform HMM topologies based on the MDL criterion,” IEICE Trans. Inf. & Syst., vol. E87-D, no. 8, pp. 2121–2129, 2004.

    Google Scholar 

  13. H. Li, M. Zhang, and J. Su, “A joint source-channel model for machine transliteration,” in Proc. of ACL, Barcelona, Spain, 2004, pp. 160–167.

    Google Scholar 

  14. M. Bisani and H. Ney, “A joint-sequence models for grapheme-to-phoneme conversion,” Speech Communication, vol. 50, pp. 434–451, 2008.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sakriani Sakti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this paper

Cite this paper

Sakti, S., Finch, A., Hori, C., Kashioka, H., Nakamura, S. (2011). Conditional Random Fields for Modeling Korean Pronunciation Variation. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1335-6_7

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1334-9

  • Online ISBN: 978-1-4614-1335-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics