Abstract
A lot of research work has been done in the area of concept mining and document similarity in past few years. But all these works were based on the statistical analysis of keywords. The major challenge in this area involves the preservation of semantics of the terms or phrases. Our paper proposes a graph model to represent the concept in the sentence level. The concept follows a triplet representation. A modified DB scan algorithm is used to cluster the extracted concepts. This cluster forms a belief network or probabilistic network. We use this network for extracting the most probable concepts in the document. In this paper we also proposes a new algorithm for document similarity. For the belief network comparison an extended chameleon Algorithm is also proposed here.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Clustering Using Concept-based Mining Model. In: ICDM 2006, pp. 1043–1048 (2006)
Shehata, S., Karray, F., Kamel, M.S.: Enhancing Text Retrieval Performance using Conceptual Ontological Graph. In: ICDM Workshops 2006, pp. 39–44 (2006)
Shehata, S., Karray, F., Kamel, M.S.: An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl. Inf. Syst. 35(2), 411–434 (2013)
Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report 941, Norwegian Computing Center (1999)
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 112–117 (1975)
Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation
Yatskevich, M., Giunchiglia, F.: Element level semantic matching using WordNet
Puri, S.: A Fuzzy Similarity Based Concept Mining Model for Text Classification
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance
Fillmore, C.: The case for case. In: Universals in Linguistic Theory. Holt, Rinehart and Winston, Inc., New York (1968)
Jurafsky, D., Martin, J.H.: Speech and language processing. Prentice Hall Inc., Upper Saddle River (2000)
Kingsbury, P., Palmer, M.: Propbank: the next level of treebank. In: Proceedings of Treebanks and Lexical Theories (2003)
Ramos, J.: Using TF-IDF to Determine Word Relevance in Document Queries
Han, J., Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Veena, G., Lekha, N.K. (2015). An Extended Chameleon Algorithm for Document Clustering. In: El-Alfy, ES., Thampi, S., Takagi, H., Piramuthu, S., Hanne, T. (eds) Advances in Intelligent Informatics. Advances in Intelligent Systems and Computing, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-319-11218-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-11218-3_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11217-6
Online ISBN: 978-3-319-11218-3
eBook Packages: EngineeringEngineering (R0)