Word Frequency Statistics Model for Chinese Base Noun Phrase Identification

Kong, Lu; Ren, Fuji; Sun, Xiao; Quan, Changqin

doi:10.1007/978-3-319-09339-0_64

Lu Kong²²,
Fuji Ren^22,23,
Xiao Sun²² &
…
Changqin Quan²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8589))

Included in the following conference series:

International Conference on Intelligent Computing

Abstract

The Chinese base phrase identification plays an important role in the field of natural language processing. It needs to be improved in the recognition scope and methods currently. This paper presents a method based on word frequency statistics model for Chinese base noun phrase identification: Building the noun phrase dictionary by training corpus, calculating the co-occurrence frequency and threshold of the noun phrase, and constructing word table according to the different roles of the words in the noun phrase. Unknown word processing and rule templates are added. Improve the results with error correction processing at last. Experiments on the test corpus show that the average precision and average recall rate of the base noun phrases identification in different areas are 91.28% and 93.22%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Xu, F., Zong, C.Q., Wang, X.: Chinese baseNP chunking by Error-driven Combination Classifiers. Journal of Chinese Information Processing 21(1), 115–119 (2007)
Google Scholar
Xu, Y.H.: Corpus-based studies of base noun phrase. Language Application (1), 120–125 (2008)
Google Scholar
Hu, N.Q., Zhu, Q.M., Zhou, G.D.: Hybrid Method to Chinese Base Noun Phrase Recognition. Computer Engineering 35(20), 199–201 (2009)
Google Scholar
Church, K.W.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text, pp. 136–143 (1988)
Google Scholar
Eric, B.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4) (1995)
Google Scholar
Zhao, J., Huang, C.N.: A Chinese base noun phrase identification model based on the transformation. Journal of Chinese Information Processing 13(2), 1–7 (1998)
MathSciNet Google Scholar
Zhou, Y.Q., Guo, Y., Huang, X.J.: Chinese and English BaseNP Recognition Based on a Maximum Entropy Model. Journal of Computer Research and Development 40(3), 440–446 (2003)
Google Scholar
Tan, W., Kong, F., Ni, J.: A mixed statistical model-based method for identifying Chinese base noun phrase. Computer Applications and Software 28(8), 254–256 (2011)
Google Scholar
Zhang, Y.Q., Zhou, Q.: Automatic identification of Chinese Base Phrases. Journal of Chinese Information Processing 16(6), 1–8 (2002)
Google Scholar
Liu, S., Li, Y., Zhang, L.: Chinese Text Chunking Using Co-training Method. Journal of Chinese Information Processing 19(3), 73–79 (2005)
MATH Google Scholar
Huang, C.N., Jin, G.J.: To observe three Chinese grammar problems from Chinese TreeBank. The language of science 12(2), 178–192 (2013)
Google Scholar
Yin, B.Y., Fang, S.Z.: New concepts and methods of word frequency statistics. In: Proceedings of Language Application (1995)
Google Scholar
Hu, W.T., Yang, Y., Yin, H.F.: Organization name recognition based on word frequency statics. Application Research of Computers 30(7), 2014–2016 (2013)
Google Scholar
Wu, X.Q., Lv, N.: An analysis method based on keyword co-occurrence frequency. The Intelligence Theory and Practice 35(8), 115–119 (2012)
Google Scholar
Mao, T., Yang, J.D., Wang, W.G.: State transition method of natural language based on finite automata machine. Journal of Liaoning Technical University (Natural Science) 31(6), 885–888 (2012)
Google Scholar
Fu, G.H., Wang, P., Wang, X.L.: Research on the approach of integrating Chinese word segmentation with Part-of-speech Tagging. Application Research of Computers (7), 24–26 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Anhui Province Key Laboratory of Affective Computing and Advanced Intelligent Machine, School of Computer and Information, Hefei University of Technology, Hefei, 230009, China
Lu Kong, Fuji Ren, Xiao Sun & Changqin Quan
Faculty of Engineering, University of Tokushima, 2-1 Minami-Josanjima, Tokushima, 770-8506, Japan
Fuji Ren

Authors

Lu Kong
View author publications
You can also search for this author in PubMed Google Scholar
Fuji Ren
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Changqin Quan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electronics and Information Engineering, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
School of Electrical Engineering, University of Ulsan, 680–749, Ulsan, Korea
Kang-Hyun Jo
Department of Automation, Tsinghua University, Bejing, China
Ling Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kong, L., Ren, F., Sun, X., Quan, C. (2014). Word Frequency Statistics Model for Chinese Base Noun Phrase Identification. In: Huang, DS., Jo, KH., Wang, L. (eds) Intelligent Computing Methodologies. ICIC 2014. Lecture Notes in Computer Science(), vol 8589. Springer, Cham. https://doi.org/10.1007/978-3-319-09339-0_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-09339-0_64
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09338-3
Online ISBN: 978-3-319-09339-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics