skip to main content
10.1145/2245276.2245306acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Semi-supervised document clustering with dual supervision through seeding

Published: 26 March 2012 Publication History

Abstract

Semi-supervised clustering algorithms for general problems use a small amount of labeled instances or pairwise instance constraints to aid the unsupervised clustering. However, user supervision can also be provided in alternative forms for document clustering, such as labeling a feature by associating it with a document or a cluster. Besides labeled documents, this paper also explores labeled features to generate cluster seeds to seed the unsupervised clustering. In this paper, we present a unified framework in which one can use both labeled documents and features in terms of seeding clusters and refine this information using intermediate clusters. We introduce two methods of using labeled features to generate cluster seeds. Experimental results on several real-world data sets demonstrate that constraining the clustering by both documents and features seeding can significantly improve document clustering performance over random seeding and document only seeding.

References

[1]
Josh Attenberg, Prem Melville, and Foster Provost. A Unified Approach to Active Dual Supervision for Labeling Features and Examples. In ECML PKDD 2010 Part I, LNAI 6321, pages 40--55. Springer, 2010.
[2]
S. Basu, A. Banerjee, and R. Mooney. Semi-supervised clustering by seeding. In International Conference on Machine Learning, pages 19--26, 2002.
[3]
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59--68. ACM, 2004.
[4]
H. Cheng, K. A. Hua, and K. Vu. Constrained locally weighted clustering. Proceedings of the PVLDB'08, 1 (1): 90--101, 2008.
[5]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 89--98. ACM, 2003. ISBN 1581137370.
[6]
B. E. Dom. An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM Research Division, 2001.
[7]
G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595--602. ACM, 2008.
[8]
Y. Hu, E. Milios, and J. Blustein. Interactive feature selection for document clustering. In the 26th Symposium On Applied Computing, pages 1148--1155. ACM Special Interest Group on Applied Computing, 2011.
[9]
Y. Huang and T. M. Mitchell. Text clustering with extended user feedback. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 420. ACM, 2006.
[10]
X. Ji and W. Xu. Document clustering with prior knowledge. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, page 412. ACM, 2006.
[11]
Joe Lamantia. Text Clouds: A New Form of Tag Cloud? http://www.joelamantia.com/tag-clouds/text-clouds-a-new-form-of-tag-cloud, 2007.
[12]
B. Liu, X. Li, W. S. Lee, and P. S. Yu. Text classification by labeling words. In Proceedings of the National Conference on Artificial Intelligence, pages 425--430, 2004.
[13]
P. Melville, W. Gryc, and R. D. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1275--1284. ACM, 2009.
[14]
H. Raghavan, O. Madani, and R. Jones. Interactive feature selection. In Proceedings of IJCAI 05: The 19th International Joint Conference on Artificial Intelligence, pages 841--846, 2005.
[15]
W. Tang, H. Xiong, S. Zhong, and J. Wu. Enhancing semi-supervised clustering: a feature projection perspective. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 707--716. ACM, 2007.
[16]
K. Wagstaff, C. Cardie, S. Rogers, and S. Schrödl. Constrained k-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577--584, 2001.
[17]
X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 326--333. ACM, 2004. ISBN 1581138881.

Cited By

View all
  • (2018)A Visual Approach for Interactive Keyterm-Based ClusteringACM Transactions on Interactive Intelligent Systems10.1145/31816698:1(1-35)Online publication date: 20-Feb-2018
  • (2016)An improved artificial bee colony algorithm for solving semi-supervised clustering2016 5th International Conference on Computer Science and Network Technology (ICCSNT)10.1109/ICCSNT.2016.8070171(315-319)Online publication date: Dec-2016
  • (2016)Automatic constraints generation for semisupervised clusteringSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-015-1643-320:6(2329-2339)Online publication date: 1-Jun-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '12: Proceedings of the 27th Annual ACM Symposium on Applied Computing
March 2012
2179 pages
ISBN:9781450308571
DOI:10.1145/2245276
  • Conference Chairs:
  • Sascha Ossowski,
  • Paola Lecca
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document clustering
  2. feature supervision
  3. features
  4. seeding
  5. text cloud
  6. user supervision

Qualifiers

  • Research-article

Conference

SAC 2012
Sponsor:
SAC 2012: ACM Symposium on Applied Computing
March 26 - 30, 2012
Trento, Italy

Acceptance Rates

SAC '12 Paper Acceptance Rate 270 of 1,056 submissions, 26%;
Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2018)A Visual Approach for Interactive Keyterm-Based ClusteringACM Transactions on Interactive Intelligent Systems10.1145/31816698:1(1-35)Online publication date: 20-Feb-2018
  • (2016)An improved artificial bee colony algorithm for solving semi-supervised clustering2016 5th International Conference on Computer Science and Network Technology (ICCSNT)10.1109/ICCSNT.2016.8070171(315-319)Online publication date: Dec-2016
  • (2016)Automatic constraints generation for semisupervised clusteringSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-015-1643-320:6(2329-2339)Online publication date: 1-Jun-2016
  • (2012)Personalized document clustering with dual supervisionProceedings of the 2012 ACM symposium on Document engineering10.1145/2361354.2361393(161-170)Online publication date: 4-Sep-2012
  • (2012)A unified framework for document clustering with dual supervisionACM SIGAPP Applied Computing Review10.1145/2340416.234042112:2(53-63)Online publication date: 1-Jun-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media