Skip to main content

The Research of Document Clustering Topical Concept Based on Neural Networks

  • Conference paper
  • First Online:
Advances in Neural Networks – ISNN 2014 (ISNN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8866))

Included in the following conference series:

  • 4177 Accesses

Abstract

Nowadays, document clustering technology has been extensively used in text mining, information retrieval systems and etc. The input of network is the key problem for topical concept utilizing the Neural Network. This paper presents an input model of Neural Network that calculates the Mutual Information between contextual words and ambiguous word by using statistical method and taking the contextual words to certain number beside the topical concept according to (-M, +N). In this paper, we introduce a novel topical document clustering method called Document Characters Indexing Clustering (DCIC), which can identify topics accurately and cluster documents according to these topics. In DCIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document characters are investigated and exploited. Experimental results show that DCIC based on BP Neural Networks models can gain a higher precision (92.76%) than some widely used traditional clustering methods.

The work is supported by the S&T plan projects of Hubei Provincial Education Department of China (No.Q20122207).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deerwester, S.T., Dumais, T.K., Landauer, G.W., Furnas, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (2012)

    Article  Google Scholar 

  2. Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-Space Model. IEEE Software 14(2), 67–75 (2009)

    Article  Google Scholar 

  3. Daniel, F. : An analysis of recent work on clustering algorithms. Technical Report, University of Washington (2004)

    Google Scholar 

  4. Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54 (2008)

    Google Scholar 

  5. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  6. Macskassy, S.A., Banerjee, A., Davison, B.D., Hirsh, H. : Human performance on clustering web pages: a preliminary study. In: Proc. of KDD, New York, NY, USA, pp. 264–268 (August 2008)

    Google Scholar 

  7. Maedche, S., Staab, A.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2) (2011)

    Google Scholar 

  8. Miller, G.: WordNet: A lexical database for english. CACM 38(11), 39–41 (2012)

    Article  Google Scholar 

  9. Neumann, G., Backofen, R., Baur, J., Becker, M., Braun, C.: An information extraction core system for real world german text processing. In: Proceedings of the Conference on Applied Natural Language Processing, Washington, USA, pp. 208–205(2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xian Fu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Fu, X., Ding, Y. (2014). The Research of Document Clustering Topical Concept Based on Neural Networks. In: Zeng, Z., Li, Y., King, I. (eds) Advances in Neural Networks – ISNN 2014. ISNN 2014. Lecture Notes in Computer Science(), vol 8866. Springer, Cham. https://doi.org/10.1007/978-3-319-12436-0_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12436-0_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12435-3

  • Online ISBN: 978-3-319-12436-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics