Conditional Random Fields for Modeling Korean Pronunciation Variation

Sakti, Sakriani; Finch, Andrew; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi

doi:10.1007/978-1-4614-1335-6_7

Sakriani Sakti³^nAff2_7,
Andrew Finch³,
Chiori Hori³,
Hideki Kashioka³ &
…
Satoshi Nakamura³^nAff2_7

477 Accesses

Abstract

This paper addresses the problem of modeling Korean pronunciation variation as a sequential labeling task where tokens in the source language (phonemic symbols) are labeled with tokens in the target language (orthographic Korean transcription). This is done by utilizing conditional random fields (CRFs), which are undirected graphical models that maximize the posterior probabilities of the label target sequence given the input source sequence. In this study, the proposed CRFbased pronunciation variation model is applied to our Korean LVCSR after we perform the standard hidden Markov model (HMM)-based recognition of the phonemic syllable of the actual pronunciation (surface forms). The goal is then to output a sequence of Korean orthography given a sequence of phonemic syllable surface forms. Experimental results show that the proposed CRF model could help enhance our Korean large-vocabulary continuous speech recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Strik and C. Cucchiarini, “Modeling pronunciation variation for ASR: A survey of the literature,” Speech Communication, vol. 29, pp. 225–246, 1999.
Article Google Scholar
B. Kim, G.G. Lee, and J. Lee, “Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information,” ACM Transactions on Asian Language Information Processing, vol. 1, no. 1, pp. 6582, 2002.
Google Scholar
J. Jeon, S. Wee, and M. Chung, “Generating pronunciation dictionary by analyzing phonological variations frequently found in spoken Korean,” in Proc. of International Conference on Speech Processing, 1997, pp. 519–524.
Google Scholar
J. Jeon, S. Cha, M. Chung, and J. Park, “Automatic generation of Korean pronunciation variants by multistage applications of phonological rules,” in Proc. of ICSLP, Sydney, Australia, 1998, pp. 1943–1946.
Google Scholar
J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. of ICML, Williamstown, MA, USA, 2001, pp. 282–289.
Google Scholar
T. Kudo, K. Yamamoto, and Y. Matsumoto, “Applying conditional random fields to Japanese morphological analysis,” in Proc of EMNLP, 2004, pp. 230–237.
Google Scholar
F. Sha and F. Pereira, “Shallow parsing with conditional random fields,” in Proc. of HLTNAACL, Edmonton, Canada, 2003, pp. 213–220.
Google Scholar
J.R. Finkel and C.D. Manning, “Joint parsing and named entity recognition,” in Proc. of NAACL, Boulder, Colorado, USA, 2009, pp. 326–334.
Google Scholar
J. Hammersley and P. Clifford, “Markov fields and finite graphs and lattices,” 1971.
Google Scholar
T. Kudo, “CRF++: Yet another CRF toolkit,” http://crfpp.sourceforge.net/, 2005.
M. Kim, Y.R. Oh, and H.K. Kim, “Non-native pronunciation variation modeling using an indirect data driven method,” in Proc. of ASRU, Kyoto, Japan, 2007, pp. 231–236.
Google Scholar
T. Jitsuhiro, T. Matsui, and S. Nakamura, “Automatic generation of non-uniform HMM topologies based on the MDL criterion,” IEICE Trans. Inf. & Syst., vol. E87-D, no. 8, pp. 2121–2129, 2004.
Google Scholar
H. Li, M. Zhang, and J. Su, “A joint source-channel model for machine transliteration,” in Proc. of ACL, Barcelona, Spain, 2004, pp. 160–167.
Google Scholar
M. Bisani and H. Ney, “A joint-sequence models for grapheme-to-phoneme conversion,” Speech Communication, vol. 50, pp. 434–451, 2008.
Article Google Scholar

Download references

Author information

Sakriani Sakti & Satoshi Nakamura
Present address: Nara Institute of Science and Technology (NAIST), Ikoma, Japan

Authors and Affiliations

National Institute of Information and Communications Technology (NICT), Koganei, Japan
Sakriani Sakti, Andrew Finch, Chiori Hori, Hideki Kashioka & Satoshi Nakamura

Authors

Sakriani Sakti
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Finch
View author publications
You can also search for this author in PubMed Google Scholar
Chiori Hori
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Kashioka
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sakriani Sakti .

Editor information

Editors and Affiliations

, Dept. of Languages and Computer Systems, University of Granada, Granada, 18071, Spain
Ramón López-Cózar Delgado
, Dept. of Computer Science & Engineering, Waseda University, Okubo 3-4-1, Tokyo, 169-8555, Japan
Tetsunori Kobayashi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sakti, S., Finch, A., Hori, C., Kashioka, H., Nakamura, S. (2011). Conditional Random Fields for Modeling Korean Pronunciation Variation. In: Delgado, RC., Kobayashi, T. (eds) Proceedings of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1335-6_7

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1335-6_7
Published: 12 August 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1334-9
Online ISBN: 978-1-4614-1335-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics