Skip to main content

An Extended Chameleon Algorithm for Document Clustering

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 320))

Abstract

A lot of research work has been done in the area of concept mining and document similarity in past few years. But all these works were based on the statistical analysis of keywords. The major challenge in this area involves the preservation of semantics of the terms or phrases. Our paper proposes a graph model to represent the concept in the sentence level. The concept follows a triplet representation. A modified DB scan algorithm is used to cluster the extracted concepts. This cluster forms a belief network or probabilistic network. We use this network for extracting the most probable concepts in the document. In this paper we also proposes a new algorithm for document similarity. For the belief network comparison an extended chameleon Algorithm is also proposed here.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Clustering Using Concept-based Mining Model. In: ICDM 2006, pp. 1043–1048 (2006)

    Google Scholar 

  2. Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Retrieval Performance using Conceptual Ontological Graph. In: ICDM Workshops 2006, pp. 39–44 (2006)

    Google Scholar 

  3. Shehata, S., Karray, F., Kamel, M.S.: An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl. Inf. Syst. 35(2), 411–434 (2013)

    Article  Google Scholar 

  4. Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report 941, Norwegian Computing Center (1999)

    Google Scholar 

  5. Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  6. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 112–117 (1975)

    Article  Google Scholar 

  7. Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation

    Google Scholar 

  8. Yatskevich, M., Giunchiglia, F.: Element level semantic matching using WordNet

    Google Scholar 

  9. Puri, S.: A Fuzzy Similarity Based Concept Mining Model for Text Classification

    Google Scholar 

  10. Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance

    Google Scholar 

  11. Fillmore, C.: The case for case. In: Universals in Linguistic Theory. Holt, Rinehart and Winston, Inc., New York (1968)

    Google Scholar 

  12. Jurafsky, D., Martin, J.H.: Speech and language processing. Prentice Hall Inc., Upper Saddle River (2000)

    Google Scholar 

  13. Kingsbury, P., Palmer, M.: Propbank: the next level of treebank. In: Proceedings of Treebanks and Lexical Theories (2003)

    Google Scholar 

  14. Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries

    Google Scholar 

  15. Han, J., Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Veena .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Veena, G., Lekha, N.K. (2015). An Extended Chameleon Algorithm for Document Clustering. In: El-Alfy, ES., Thampi, S., Takagi, H., Piramuthu, S., Hanne, T. (eds) Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-11218-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11218-3_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11217-6

  • Online ISBN: 978-3-319-11218-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics