The Research of Document Clustering Topical Concept Based on Neural Networks

Fu, Xian; Ding, Yi

doi:10.1007/978-3-319-12436-0_69

Xian Fu¹⁶ &
Yi Ding¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8866))

Included in the following conference series:

International Symposium on Neural Networks

4177 Accesses

Abstract

Nowadays, document clustering technology has been extensively used in text mining, information retrieval systems and etc. The input of network is the key problem for topical concept utilizing the Neural Network. This paper presents an input model of Neural Network that calculates the Mutual Information between contextual words and ambiguous word by using statistical method and taking the contextual words to certain number beside the topical concept according to (-M, +N). In this paper, we introduce a novel topical document clustering method called Document Characters Indexing Clustering (DCIC), which can identify topics accurately and cluster documents according to these topics. In DCIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document characters are investigated and exploited. Experimental results show that DCIC based on BP Neural Networks models can gain a higher precision (92.76%) than some widely used traditional clustering methods.

The work is supported by the S&T plan projects of Hubei Provincial Education Department of China (No.Q20122207).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Deerwester, S.T., Dumais, T.K., Landauer, G.W., Furnas, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (2012)
Article Google Scholar
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-Space Model. IEEE Software 14(2), 67–75 (2009)
Article Google Scholar
Daniel, F. : An analysis of recent work on clustering algorithms. Technical Report, University of Washington (2004)
Google Scholar
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54 (2008)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (2007)
Google Scholar
Macskassy, S.A., Banerjee, A., Davison, B.D., Hirsh, H. : Human performance on clustering web pages: a preliminary study. In: Proc. of KDD, New York, NY, USA, pp. 264–268 (August 2008)
Google Scholar
Maedche, S., Staab, A.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2) (2011)
Google Scholar
Miller, G.: WordNet: A lexical database for english. CACM 38(11), 39–41 (2012)
Article Google Scholar
Neumann, G., Backofen, R., Baur, J., Becker, M., Braun, C.: An information extraction core system for real world german text processing. In: Proceedings of the Conference on Applied Natural Language Processing, Washington, USA, pp. 208–205(2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Hubei Normal University, Huangshi, 435002, China
Xian Fu & Yi Ding

Authors

Xian Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Ding
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xian Fu .

Editor information

Editors and Affiliations

Wuhan, China
Zhigang Zeng
University of Macau, Macau, Macao
Yangmin Li
The Chinese University of Hong Kong, Hong Kong, Hong Kong, Hong Kong SAR
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, X., Ding, Y. (2014). The Research of Document Clustering Topical Concept Based on Neural Networks. In: Zeng, Z., Li, Y., King, I. (eds) Advances in Neural Networks – ISNN 2014. ISNN 2014. Lecture Notes in Computer Science(), vol 8866. Springer, Cham. https://doi.org/10.1007/978-3-319-12436-0_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-12436-0_69
Published: 19 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12435-3
Online ISBN: 978-3-319-12436-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics