Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese

Bai, Xue-Mei; Li, Jin-Ji; Kim, Dong-Il; Lee, Jong-Hyeok

doi:10.1007/11940098_28

Xue-Mei Bai²²,
Jin-Ji Li²²,
Dong-Il Kim²³ &
…
Jong-Hyeok Lee²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

1036 Accesses
1 Citations

Abstract

In general, there are two types of noun phrases (NP): Base Noun Phrase (BNP), and Maximal-Length Noun Phrase (MNP). MNP identification can largely reduce the complexity of full parsing, help analyze the general structure of complex sentences, and provide important clues for detecting main predicates in Chinese sentences. In this paper, we propose a 2-phase hybrid approach for MNP identification which adopts salient features such as expanded chunks and classified punctuations to improve performance. Experimental result shows a high quality performance of 89.66% in F₁-measure.

The detailed explanation of Expanded Chunks and Classified Punctuations will be shown in Section 3.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abney, S.P.: Parsing by Chunks. In: Principle-Based Parsing, pp. 257–278. Kluwer Academic Publishers, Dordrecht (1991)
Google Scholar
Qiang, Z., Maosong, S., Changning, H.: Automatically Identify Chinese Maximal Noun Phrase, Technical Report 99001, State Key Lab. of Intelligent Technology and Systems, Dept. of Computer Science and Technology, Tsinghua University (1998)
Google Scholar
Bourigault, D.: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases. In: Boitet, C. (ed.) Proceedings of the 15th International Conference on Computational Linguistics (COLING 1992), Nantes, France, pp. 977–981 (1992)
Google Scholar
Chen, K.-h., Chen, H.-H.: Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation. In: Proceedings of 32nd Annual Meeting of Association of Computational Linguistics, New York, pp. 234–241 (1994)
Google Scholar
Li, W., Pan, H., Zhou, M., Wong, K.-F., Lum, V.: Corpus-based Maximal-length Chinese Noun Phrase Extraction. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 246–251 (1995)
Google Scholar
Tse, A.S.Y., Wong, K.-F., et al.: Effectiveness Analysis of Linguistics- and Corpus-based Noun Phrase Partial Parsers. In: Choi, K.-S. (ed.) Proceedings of Natural Language Processing Pacific Rim Symposium (NLPRS 1995), Korea, pp. 252–257 (1995)
Google Scholar
Yin, C.: Identification of Maximal Noun Phrase in Chinese: Using the Head of Base Phrases, Master Dissertation, POSTECH, Korea (2005) (in Korean)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 Shared Task: Chunking. In: Proceedings of CoNLL 2000 and LLL 2000, pp. 127–132 (2000)
Google Scholar
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying Conditional Random Fields to Chinese Shallow Parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Chapter Google Scholar
Tjong Kim Sang, E.F., Daelemans, W., Dejean, H., Koeling, R., Krymolowski, Y., Punyakanok, V., Roth, D.: Applying system combination to base noun phrase identification. In: Proceedings of CoNLL 2000, pp. 857–863 (2000)
Google Scholar
Kudo, T., Matsumoto, Y.: Chunking with Support Vector Machines. In: Proceedings of Second Meeting of North American Chapter of the Association for Computational Linguistics (NAACL), pp. 192–199 (2001)
Google Scholar
WEKA machine learning toolkit, http://www.cs.waikato.ac.nz/~ml/
LIBSVM: Multi-Class Support Vector Machine Learning Toolkit, http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
Lin, S.-F.: Study and Application of Punctuation (标点符号的学习和应用), People’s Pulisher, P.R.China (in Chinese)
Google Scholar
Penn Chinese TreeBank 4.0, http://www.cis.upenn.edu/~chinese
Zhou, M.: A Block-based Robust Dependency Parser for Unrestricted Chinese Text. In: Proceedings of the Second Chinese Language Processing Workshop, pp. 78–84 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Electrical and Computer Engineering Division and Advanced Information Technology Research Center (AITrc), Pohang University of Science and Technology (POSTECH), San 31 Hyoja Dong, Pohang, 790-784, R. Korea
Xue-Mei Bai, Jin-Ji Li & Jong-Hyeok Lee
Language Engineering Institute, Department of Computer, Electron and Telecommunication Engineering, Yanbian University of Science and Technology (YUST), Yanji, Jilin, 133-000, P.R. China
Dong-Il Kim

Authors

Xue-Mei Bai
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Ji Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Il Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Graduate School of Information Science, Nara Institute of Science and Technology, 630-0192, Takayama, Ikoma, Nara, Japan
Yuji Matsumoto
Dept of ECE, University of Illinois at Urbana Champaign, IL 61801, Urbana, USA
Richard W. Sproat
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Kam-Fai Wong
State Key Lab of Intelligent Tech. & Sys., Tsinghua University,
Min Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, XM., Li, JJ., Kim, DI., Lee, JH. (2006). Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_28

Download citation

DOI: https://doi.org/10.1007/11940098_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics