Dynamic Clustering Based on Minimum Spanning Tree and Context Similarity for Enhancing Document Classification

Dynamic Clustering Based on Minimum Spanning Tree and Context Similarity for Enhancing Document Classification

Anirban Chakrabarty, Sudipta Roy
Copyright: © 2014 |Volume: 4 |Issue: 1 |Pages: 15
ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781466654877|DOI: 10.4018/ijirr.2014010103
Cite Article Cite Article

MLA

Chakrabarty, Anirban, and Sudipta Roy. "Dynamic Clustering Based on Minimum Spanning Tree and Context Similarity for Enhancing Document Classification." IJIRR vol.4, no.1 2014: pp.46-60. http://doi.org/10.4018/ijirr.2014010103

APA

Chakrabarty, A. & Roy, S. (2014). Dynamic Clustering Based on Minimum Spanning Tree and Context Similarity for Enhancing Document Classification. International Journal of Information Retrieval Research (IJIRR), 4(1), 46-60. http://doi.org/10.4018/ijirr.2014010103

Chicago

Chakrabarty, Anirban, and Sudipta Roy. "Dynamic Clustering Based on Minimum Spanning Tree and Context Similarity for Enhancing Document Classification," International Journal of Information Retrieval Research (IJIRR) 4, no.1: 46-60. http://doi.org/10.4018/ijirr.2014010103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Document Classification is the task of assigning a text document to one or more predefined categories according to its content and the labeled training samples. Traditional classification schemes use all training samples for classification, thereby increasing storage requirements and calculation complexity as the number of features increase. Moreover, the commonly used classification techniques consider the number of categories is known in advance, this may not be so in actual reality. In the practical scenario, it is very much essential to find the number of clusters for unknown dataset dynamically. Identifying these limitations, the proposed work evolves a text clustering algorithm where clusters are generated dynamically based on minimum spanning tree incorporating semantic features. The proposed model can efficiently find the significant matching concepts between documents and can perform multi category classification. The formal analysis is supported by applications to email and cancer data sets. The cluster quality and accuracy values were compared with some of the widely used text clustering techniques which showed the efficiency of the proposed approach.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.