Learning Concepts from Text Based on the Inner-Constructive Model

Wang, Shi; Cao, Yanan; Cao, Xinyu; Cao, Cungen

doi:10.1007/978-3-540-76719-0_27

Learning Concepts from Text Based on the Inner-Constructive Model

Shi Wang^1,2,
Yanan Cao^1,2,
Xinyu Cao^1,2 &
…
Cungen Cao²

Conference paper

1256 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4798))

Abstract

This paper presents a new model for automatic acquisition of lexical concepts from text, referred to as Concept Inner-Constructive Model (CICM). The CICM clarifies the rules when words construct concepts through four aspects including (1) parts of speech, (2) syllable, (3) senses and (4) attributes. Firstly, we extract a large number of candidate concepts using lexico-patterns and confirm a part of them to be concepts if they matched enough patterns for some times. Then we learn CICMs using the confirmed concepts automatically and distinguish more concepts with the model. Essentially, the CICM is an instances learning model but it differs from most existing models in that it takes into account a variety of linguistic features and statistical features of words as well. And for more effective analogy when learning new concepts using CICMs, we cluster similar words based on density. The effectiveness of our method has been evaluated on a 160G raw corpus and 5,344,982 concepts are extracted with a precision of 89.11% and a recall of 84.23%.

This work is supported by the National Natural Science Foundation of China under Grant No.60496326, 60573063, and 60573064; the National 863 Program under Grant No. 2007AA01Z325.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cao, C., et al.: Progress in the Development of National Knowledge Infrastructure. Journal of Computer Science & Technology 17(5, 1), C16 (2002)
Google Scholar
Ramirez, P.M., Mattmann, C.A.: ACE: improving search engines via Automatic Concept Extraction. In: Proceedings of the 2004 IEEE International Conference, pp. 229–234. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Zhang, Y.-T., Gong, L., Wang, Y.-C., Yin, Z.-H.: An Effective Concept Extraction Method for Improving Text Classification Performance. Geo-Spatial Information Science 6(4) (2003)
Google Scholar
Acquemin, C., Bourigault, D.: Term Extraction and Automatic Indexing. Oxford University Press, Oxford (2000)
Google Scholar
Chen, W.L., Zhu, J.B., Yao, T.: Automatic learning field words by bootstrapping. In: Proc. of the JSCL. Beijing: Tsinghua University Press, pp. 67–72 (2003)
Google Scholar
Zheng, J.H., Lu, J.L.: Study of an improved keywords distillation method. Computer Engineering 31(194), C196 (2005)
Google Scholar
Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching very large ontologies using the WWW. In: Proc. of the ECAI 2004 Workshop on Ontology Learning (2004)
Google Scholar
Du, B., Tian, H.F., Wang, L., Lu, R.Z.: Design of domain-specific term extractor based on multi-strategy. Computer Engineering 31(14), 159–C160 (2005)
Google Scholar
Velardi, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of the FOIS, pp. 270–284. ACM Press, New York (2001)
Chapter Google Scholar
Gelfand, B., Wulfekuler, M., Punch, W.F.: Automated concept extraction from plain text. In: AAAI 1998 Workshop on Text Categorization, Madison, WI, pp. 13–17 (1998)
Google Scholar
Nakata, K., Voss, A., Juhnke, M., Kreifelts, T.: Collaborative Concept Extraction from Documents. In: Reimer, U. (ed.) PAKM 1998. Proc. Second International Conference on Practical Aspects of Knowledge Management, Basel (1998)
Google Scholar
Zhang, C., Hao, T.: The State of the Art and Difficulties in Automatic Chinese Word Segmentation. Journal of Chinese System Simulation 17(1), 138–C147 (2005)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING 1992. Proceedings of the 14th International Conference on Computational Linguistics, pp. 539–545 (1992)
Google Scholar
Lu, C., Liang, Z., Guo, A.: The semantic networks: a knowledge representation of Chinese information process. In: ICCIP 1992, pp. 50–57 (1992)
Google Scholar
Laurence, S., Margolis, E.: Concepts: Core Readings. MIT Press, Cambridge, Mass (1999)
Google Scholar
Yu, L.: A Research on Acquisition and Verification of Concepts from Large-Scale Chinese Corpora. A dissertation Submitted to Graduate School of the Chinese academy of Sciences for the degree of master. Beijing China (May 2006)
Google Scholar
Dong, Z., Dong, Q.: HowNet and the computation of meaning. World Scientific Publishing Co., Inc., Singapore (2006)
Google Scholar
Tian, G.: Research os Self-Supervised Knowledge Acquisition from Text based on Constrained Chinese Corpora. A dissertation submitted to Graduate University of the Chinese Academy of Sciences for the degree of Doctor of Philosophy. Beijing China (May 2007)
Google Scholar
Hinneburg, A., Keim, D.: An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate University of Chinese Academy of Sciences, Beijing, 100049, China
Shi Wang, Yanan Cao & Xinyu Cao
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China
Shi Wang, Yanan Cao, Xinyu Cao & Cungen Cao

Authors

Shi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Cao
View author publications
You can also search for this author in PubMed Google Scholar
Cungen Cao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zili Zhang Jörg Siekmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Cao, Y., Cao, X., Cao, C. (2007). Learning Concepts from Text Based on the Inner-Constructive Model. In: Zhang, Z., Siekmann, J. (eds) Knowledge Science, Engineering and Management. KSEM 2007. Lecture Notes in Computer Science(), vol 4798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76719-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-76719-0_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76718-3
Online ISBN: 978-3-540-76719-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics