An Extended Chameleon Algorithm for Document Clustering

Veena, G.; Lekha, N. K.

doi:10.1007/978-3-319-11218-3_31

An Extended Chameleon Algorithm for Document Clustering

G. Veena⁷ &
N. K. Lekha⁷

Conference paper

1869 Accesses
3 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 320))

Abstract

A lot of research work has been done in the area of concept mining and document similarity in past few years. But all these works were based on the statistical analysis of keywords. The major challenge in this area involves the preservation of semantics of the terms or phrases. Our paper proposes a graph model to represent the concept in the sentence level. The concept follows a triplet representation. A modified DB scan algorithm is used to cluster the extracted concepts. This cluster forms a belief network or probabilistic network. We use this network for extracting the most probable concepts in the document. In this paper we also proposes a new algorithm for document similarity. For the belief network comparison an extended chameleon Algorithm is also proposed here.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Clustering Using Concept-based Mining Model. In: ICDM 2006, pp. 1043–1048 (2006)
Google Scholar
Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Retrieval Performance using Conceptual Ontological Graph. In: ICDM Workshops 2006, pp. 39–44 (2006)
Google Scholar
Shehata, S., Karray, F., Kamel, M.S.: An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl. Inf. Syst. 35(2), 411–434 (2013)
Article Google Scholar
Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report 941, Norwegian Computing Center (1999)
Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 112–117 (1975)
Article Google Scholar
Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation
Google Scholar
Yatskevich, M., Giunchiglia, F.: Element level semantic matching using WordNet
Google Scholar
Puri, S.: A Fuzzy Similarity Based Concept Mining Model for Text Classification
Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance
Google Scholar
Fillmore, C.: The case for case. In: Universals in Linguistic Theory. Holt, Rinehart and Winston, Inc., New York (1968)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and language processing. Prentice Hall Inc., Upper Saddle River (2000)
Google Scholar
Kingsbury, P., Palmer, M.: Propbank: the next level of treebank. In: Proceedings of Treebanks and Lexical Theories (2003)
Google Scholar
Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries
Google Scholar
Han, J., Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Application, Amrita Vishwa Vidyapeetham, Amritapuri, India
G. Veena & N. K. Lekha

Authors

G. Veena
View author publications
You can also search for this author in PubMed Google Scholar
N. K. Lekha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Veena .

Editor information

Editors and Affiliations

King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy
Indian Institute of Information Technology and Management- - Kerala (IIITM-K), Trivandrum, Kerala, India
Sabu M. Thampi
Faculty of Design, Kyushu University, Fukuoka, Japan
Hideyuki Takagi
Department of Information Systems and Operations Management, University of Florida, Warrington College of Business, Florida, Florida, USA
Selwyn Piramuthu
University of Applied Sciences, Institute for Information Systems, Olten, Switzerland
Thomas Hanne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veena, G., Lekha, N.K. (2015). An Extended Chameleon Algorithm for Document Clustering. In: El-Alfy, ES., Thampi, S., Takagi, H., Piramuthu, S., Hanne, T. (eds) Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-11218-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-11218-3_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11217-6
Online ISBN: 978-3-319-11218-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics