short-paper

Korean Part-of-speech Tagging Based on Morpheme Generation

Authors:
Hyun-Je Song

Jeonbuk National University, Jeonju, Republic of Korea

Jeonbuk National University, Jeonju, Republic of Korea
View Profile

,
Seong-Bae Park

Kyung Hee University, Yongin, Republic of Korea

Kyung Hee University, Yongin, Republic of Korea
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19 Issue 3Article No.: 41pp 1–10https://doi.org/10.1145/3373608

Published:09 January 2020Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger that solves the problems. This tagger first generates a sequence of lemmatized and recovered morphemes that can be mapped one-to-one to a POS tag using an encoder-decoder architecture derived from a POS-tagged corpus. Then, the POS tag of each morpheme in the generated sequence is finally determined by a standard sequence labeling method. Since the knowledge for segmenting and recovering morphemes is extracted automatically from a POS-tagged corpus by an encoder-decoder architecture, the POS tagger is constructed without a dictionary nor handcrafted linguistic rules. The experimental results on a standard dataset show that the proposed method outperforms existing POS taggers with its state-of-the-art performance.

References

Dae-Ho Baek, Ho Lee, and Hae-Chang Rim. 1995. A structure of Korean electronic dictionary using the finite state transducer. In Proceedings of the 1995 Conference on Hangul and Korean Information Processing. 181--187.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1724--1734.Google ScholarCross Ref
Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio. 2016. A character-level decoder without explicit segmentation for neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1693--1703.Google ScholarCross Ref
Cícero Nogueira dos Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning. 1818--1826.Google Scholar
Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neur. Netw. 18, 5 (2005), 602--610.Google ScholarDigital Library
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O. K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1631--1640.Google Scholar
Georg Heigold, Guenter Neumann, and Josef van Genabith. 2016. Neural morphological tagging from characters for morphologically rich languages. CoRR abs/1606.06640 (2016). arxiv:1606.06640 http://arxiv.org/abs/1606.06640.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780.Google ScholarDigital Library
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).Google Scholar
Sangkeun Jung, Changki Lee, and Hyunsun Hwang. 2018. End-to-end Korean part-of-speech tagging using copying mechanism. ACM Trans. Asian Low-Resource Lang. Inf. Process. 17, 3 (2018), 19:1--19:8.Google Scholar
Seung-Shik Kang. 1995. Morphological analysis of Korean irregular verbs using syllable characteristics. J. Kor. Inf. Sci. Soc. 22, 10 (1995), 1480--1487. [in Korean]Google Scholar
Cheol-Su Kim, Woo-jeong Bae, Yong-seok Lee, and Jun-ichi Aoe. 1996. Construction of Korean electronic dictionary using double-array trie structure. J. Kor. Inf. Sci. Soc. 23, 1 (1996), 85--94. [in Korean]Google Scholar
Deok-Bong Kim, Sung-Jin Lee, Key-Sun Choi, and Gil-Chang Kim. 1994. A two-level morphological analysis of Korean. In Proceedings of the 15th Conference on Computational Linguistics, Volume 1. 535--539.Google ScholarDigital Library
Seong-Yong Kim. 1987. A Morphological Analyzer for Korean Language with Tabular Parsing Method and Connectivity Information. Master’s thesis. KAIST.Google Scholar
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Kimmo Koskenniemi. 1983. Two-level model for morphological analysis. In Proceedings of the 8th International Joint Conference on Artificial Intelligence. 683--685.Google Scholar
Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. 2004. Applying conditional random fields to Japanese morphological analysis. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 230--237.Google Scholar
Oh-Woog Kwon, Yujin Chung, Mi-Young Kim, Dong-Won Ryu, Moon-Ki Lee, and Jong-Hyeok Lee. 1999. Korean morphological analyzer and part-of-speech tagger based on CYK algorithm using syllable information. In Proceedings of the MATEC Web Conferences. 76--88.Google Scholar
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning. 282--289.Google ScholarDigital Library
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 260--270.Google ScholarCross Ref
Chung-Hee Lee, Joon-Ho Lim, Soojong Lim, and Hyun-Ki Kim. 2016. Syllable-based Korean POS tagging based on combining a pre-analyzed dictionary with machine learning. J. Kor. Inst. Inf. Sci. Eng. 43, 3 (2016), 362--369. [in Korean]Google Scholar
Dongjoo Lee, Jongheum Yeon, and Sang-goo Lee. 2011. A unified probablistic model for correcting spacing errors and improving accuracy of morphological analysis of Korean sentences. In Proceedings of Korea Computer Congress 2011. 237--240. [in Korean]Google Scholar
Do-Gil Lee and Hae-Chang Rim. 2009. Probabilistic modeling of Korean morphology. IEEE Trans. Aud. Speech Lang. Process. 17, 5 (2009), 945--955.Google ScholarDigital Library
Jason Lee, Kyunghyun Cho, and Thomas Hofmann. 2016. Fully character-level neural machine translation without explicit segmentation. arXiv preprint arXiv:1610.03017 (2016).Google Scholar
Jae-Sung Lee. 2011. Three-step probabilistic model for Korean morphological analysis. J. Kor. Inst. Inf. Sci. Eng. Softw. Appl. 38, 5 (2011), 257--268. [in Korean]Google Scholar
Heui-Seok Lim Lim, Sang-Zoo Lee, and Hae-Chang Rim. 1995. An efficient Korean morphological analysis using exclusive information. In Proceedings of the International Conference on Computer Processing of Oriental Language. 255--258.Google Scholar
Thang Luong, Ilya Sutskever, Quoc Le, Oriol Vinyals, and Wojciech Zaremba. 2015. Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 11--19.Google ScholarCross Ref
Christopher D. Manning. 2011. Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing. 171--189.Google ScholarCross Ref
Andrew Matteson, Chanhee Lee, Youngbum Kim, and Heuiseok Lim. 2018. Rich character-level information for Korean morphological analysis and part-of-speech tagging. In Proceedings of the 27th International Conference on Computational Linguistics. 2482--2492.Google Scholar
Seung-Hoon Na. 2015. Conditional random fields for Korean morpheme segmentation and POS tagging. ACM Trans. Asian Low-Resource Lang. Inf. Process. 14, 3 (2015), 10:1--10:16.Google Scholar
Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (1989), 257--286.Google ScholarCross Ref
Kwangseob Shim and Jaehyung Yang. 2002. MACH: A supersonic Korean morphological analyzer. In Proceedings of the 19th International Conference on Computational Linguistics. 939--945.Google ScholarDigital Library
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929--1958.Google ScholarDigital Library
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of Advances in Neural Information Processing Systems. 3104--3112.Google ScholarDigital Library
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technology. 173--180.Google ScholarDigital Library

Index Terms

Korean Part-of-speech Tagging Based on Morpheme Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

There has been recent interest in statistical approaches to Korean morphological analysis. However, previous studies have been based mostly on generative models, including a hidden Markov model (HMM), without utilizing discriminative models such as a ...
Read More
Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

Most errors in Korean morphological analysis and part-of-speech (POS) tagging are caused by unknown morphemes. This paper presents a syllable-pattern-based generalized unknown-morpheme-estimation method with POSTAG (POStech TAGger), which is a ...
Read More
A Cross-lingual Part-of-Speech Tagging for Malay Language
ICAART 2015: Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 2

Cross-lingual annotation projection methods can benefit from rich-resourced languages to improve the performance

of Natural Language Processing (NLP) tasks in less-resourced languages. In this research, Malay

is experimented as the less-resourced ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 19, Issue 3
May 2020
228 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3378675
Editor:
Imed Zitouni
Microsoft, USA
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 January 2020
- Accepted: 1 November 2019
- Revised: 1 July 2019
- Received: 1 September 2017
Published in tallip Volume 19, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Part-of-speech tagging
morpheme generation
morphologically complex languages
Qualifiers
- short-paper
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 590
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Korean Part-of-speech Tagging Based on Morpheme Generation

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

A Cross-lingual Part-of-Speech Tagging for Malay Language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Korean Part-of-speech Tagging Based on Morpheme Generation

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Conditional Random Fields for Korean Morpheme Segmentation and POS Tagging

Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

A Cross-lingual Part-of-Speech Tagging for Malay Language

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media