On Knowledge-Enhanced Document Clustering

Manjeet Rege, Josan Koruthu, Reynold Bailey

Source Title: International Journal of Information Retrieval Research (IJIRR)2(3)

ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781466612631|DOI: 10.4018/ijirr.2012070105

MLA

Rege, Manjeet, et al. "On Knowledge-Enhanced Document Clustering." IJIRR vol.2, no.3 2012: pp.72-82. http://doi.org/10.4018/ijirr.2012070105

APA

Rege, M., Koruthu, J., & Bailey, R. (2012). On Knowledge-Enhanced Document Clustering. International Journal of Information Retrieval Research (IJIRR), 2(3), 72-82. http://doi.org/10.4018/ijirr.2012070105

Chicago

Rege, Manjeet, Josan Koruthu, and Reynold Bailey. "On Knowledge-Enhanced Document Clustering," International Journal of Information Retrieval Research (IJIRR) 2, no.3: 72-82. http://doi.org/10.4018/ijirr.2012070105

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

Document clustering plays an important role in text analytics by finding natural groupings of documents based on their similarity determined by the words appearing in them. Many of the clustering algorithms accessible through various text analytics tools are completely unsupervised in nature. That is, they are unable to incorporate any domain knowledge that might be available about the documents to improve the clustering accuracy and relevance. The authors present a graph partitioning based semi-supervised document clustering algorithm. The user provides knowledge about few of the documents in the form of “must-link” and “cannot-link” constraints between pairs of documents. A “must-link” constraint between two documents expresses the fact that the user feels that the two corresponding documents must be clustered irrespective of their dissimilarity. Similarly, a “cannot-link” signifies that the two documents should never be clustered together no matter how similar they might happen to be. These constraints are then incorporated into a graph partitioning based into a computationally efficient document clustering algorithm. Through experiments performed on publicly available text datasets, the proposed framework is validated.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

On Knowledge-Enhanced Document Clustering

MLA

APA

Chicago

Export Reference

Abstract

Request Access