Category-Pattern-Based Korean Word-Spacing

Kang, Mi-young; Jung, Sung-won; Kwon, Hyuk-chul

doi:10.1007/11940098_30

Mi-young Kang^22,23,
Sung-won Jung^22,23 &
Hyuk-chul Kwon^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

999 Accesses

Abstract

It is difficult to cope with data sparseness, unless augmenting the size of the dictionary in a stochastic-based word-spacing model is an option. To resolve both data sparseness and the dictionary memory size problem, this paper describes the process of dynamically providing candidate words to detect correct words using morpheme unigrams and their categories. Each candidate word’s probability was estimated from the morpheme probability, which was weighted according to its category. The category weights were trained to minimize the mean of the errors between the observed probability of a word and that estimated by the word’s individual morpheme probability weighted by its category power in a category pattern for producing the given word.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kang, M.Y., Yoon, A.S., Kwon, H.C.: Combined Word-Spacing Method for Disambiguating Korean Texts. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 562–573. Springer, Heidelberg (2004)
Chapter Google Scholar
Kang, S.S., Woo, C.W.: Automatic Segmentation of Words Using Syllable Bigram Statistics. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, pp. 729–732 (2001)
Google Scholar
Lee, D.G., Lee, S.Z., Lim, H.S., Rim, H.C.H.: Two Statistical Models for Automatic Word spacing of Korean Sentences. Journal of KISS(B): Software and Applications 30(4), 358–370 (2003)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)
Google Scholar
Shim, K.S.: Automated Word-Segmentation for Korean using Mutual Information of Syllables. Journal of KISS(B) 23, 991–1000 (1996)
Google Scholar
Sin, H.C.H.: A Study of Word-spacing using Morphological Analysis. Korean Linguistic 12 12, 167–185 (2000)
Google Scholar
Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Computational Linguistics 22(3), 377–404 (1996)
Google Scholar
Tsai, C.-H.: Word identification and eye movements in reading Chinese: A modeling approach. Doctoral thesis, University of Illinois at Urbana-Champaign, IL, USA (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Korean Language Processing Laboratory, Department of Computer Science Engineering, Pusan National University,
Mi-young Kang, Sung-won Jung & Hyuk-chul Kwon
Center for U-Port IT Research and Education, Pusan National University, Jangjeon-dong, Geumjeong-gu, 609-735, Busan, Korea
Mi-young Kang, Sung-won Jung & Hyuk-chul Kwon

Authors

Mi-young Kang
View author publications
You can also search for this author in PubMed Google Scholar
Sung-won Jung
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-chul Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, My., Jung, Sw., Kwon, Hc. (2006). Category-Pattern-Based Korean Word-Spacing. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_30

Download citation

DOI: https://doi.org/10.1007/11940098_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics