Automatic constraints generation for semisupervised clustering: experiences with documents classification

Diaz-Valenzuela, Irene; Loia, Vincenzo; Martin-Bautista, Maria J.; Senatore, Sabrina; Vila, M. Amparo

doi:10.1007/s00500-015-1643-3

Automatic constraints generation for semisupervised clustering: experiences with documents classification

Methodologies and Application
Published: 17 March 2015

Volume 20, pages 2329–2339, (2016)
Cite this article

Soft Computing Aims and scope Submit manuscript

Irene Diaz-Valenzuela¹,
Vincenzo Loia²,
Maria J. Martin-Bautista¹,
Sabrina Senatore² &
…
M. Amparo Vila¹

383 Accesses
Explore all metrics

Abstract

In the last times, semi-supervised clustering has been an area that has received a lot of attention. It is distinguished from more traditional unsupervised approaches on the use of a small amount of supervision to “steer” clustering. Unfortunately in the real world, the supervision is not always available: data to process are often too large and so the cost (in terms of time and human resources) for user-provided information is not conceivable. To address this issue, this work presents an automatic generation of the supervision, by the analysis of the data structure itself. This analysis is performed using a partitional clustering algorithm that discovers relationships between pairs of instances that may be used as a semi-supervision in the clustering process. The methodology has been studied in the document clustering domain, an area where novel approaches for accurate documents classifications are strongly required. Experimental result shows the validity of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of the Convergence in Automatic Generation of Instance Level Constraints

Semi-supervised Learning of Database Annotated Data Clustering Method

A Survey of Constrained Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. Mining text data. Springer, US, pp 77–128
Chapter Google Scholar
Barr J, Cament L, Bowyer K, Flynn P (2014) Active clustering with ensembles for social structure extraction. In: Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on. pp 969–976
Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. In: Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 27–34 (ICML ’02)
Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 59–68. doi:10.1145/1014052.1014062 (KDD ’04)
Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, theory, and applications, 1st edn. Chapman & Hall/CRC
Cutting DR, Karger DR, Pedersen JO, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, pp 318–329. doi: 10.1145/133160.133214 (SIGIR ’92)
Diaz-Valenzuela I, Martin-Bautista MJ, Vila MA (2013) Using a semisupervised fuzzy clustering process for identity identification in digital libraries. In: IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint. pp 831–836
Diaz-Valenzuela I, Martín-Bautista MJ, Vila MA (2014) A fuzzy semisupervised clustering method: Application to the classification of scientific publications. In: Laurent A, Strauss O, Bouchon-Meunier B, Yager RR (eds) Information Processing and management of uncertainty in knowledge-based systems—15th International Conference, IPMU 2014, Montpellier, France, July 15–19, 2014. Proceedings, Part I, Springer, Communications in Computer and Information Science, vol 442. pp 179–188. doi:10.1007/978-3-319-08795-5
Grira N, Crucianu M, Boujemaa N (2004) Unsupervised and semi-supervised clustering: a brief survey. In: in ‘A Review of Machine Learning Techniques for Processing Multimedia Content’, Report of the MUSCLE European Network of Excellence FP6
Hu Y, Milios EE, Blustein J (2012) Semi-supervised document clustering with dual supervision through seeding. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing. ACM, New York, pp 144–151. doi:10.1145/2245276.2245306 (SAC ’12)
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River
MATH Google Scholar
Leuski A (2001) Evaluating document clustering for interactive information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management. ACM, New York, pp 33–40, doi:10.1145/502585.502592 (CIKM ’01)
Li X, Wang L, Song Y, Zhao X (2010) A hybrid constrained semi-supervised clustering algorithm. In: Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on, vol 4. pp 1597–1601
Loia V, Pedrycz W, Senatore S (2003) P-FCM: a proximity-based fuzzy clustering for user-centered web applications. Int J Approx Reason 34(2–3):121–144. doi:10.1016/j.ijar.2003.07.004
Article MATH Google Scholar
Pedrycz W, Loia V, Senatore S (2010) Fuzzy clustering with viewpoints. IEEE Trans Fuzzy Syst 18(2):274–284
Google Scholar
Phan XH, Nguyen LM, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web. ACM, New York, pp 91–100, doi:10.1145/1367497.1367510 (WWW ’08)
Rigutini L, Maggini M (2005) A semi-supervised document clustering algorithm based on EM. In: Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on. pp 200–206. doi:10.1109/WI.2005.13
Sahoo N, Callan J, Krishnan R, Duncan G, Padman R (2006) Incremental hierarchical clustering of text documents. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, New York, pp 357–366. doi:10.1145/1183614.1183667 (CIKM ’06)
Tang W, Xiong H, Zhong S, Wu J (2007) Enhancing semi-supervised clustering: a feature projection perspective. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 707–716 (KDD ’07)
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning. pp 1103–1110
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 577–584 (ICML ’01)
Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Advances in Neural Information Processing Systems 15, vol 15. pp 505–512. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.3667
Xiong S, Azimi J, Fern X (2014) Active learning of constraints for semi-supervised clustering. Knowl Data Eng IEEE Trans 26(1):43–54
Article Google Scholar
Zhao W, He Q, Ma H, Shi Z (2012) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 30(3):569–587. doi:10.1007/s10115-011-0389-1
Article Google Scholar

Download references

Acknowledgments

This work has been partially funded by the Spanish Ministry of Education under the “Programa de Formación del Profesorado Universitario (FPU)” and the Short Stays Program from CEI-Biotic (University of Granada).

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, 18071, Granada, Spain
Irene Diaz-Valenzuela, Maria J. Martin-Bautista & M. Amparo Vila
Dipartimento di Informatica, Università degli Studi di Salerno, 84084, Fisciano, SA, Italy
Vincenzo Loia & Sabrina Senatore

Authors

Irene Diaz-Valenzuela
View author publications
You can also search for this author inPubMed Google Scholar
Vincenzo Loia
View author publications
You can also search for this author inPubMed Google Scholar
Maria J. Martin-Bautista
View author publications
You can also search for this author inPubMed Google Scholar
Sabrina Senatore
View author publications
You can also search for this author inPubMed Google Scholar
M. Amparo Vila
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sabrina Senatore.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diaz-Valenzuela, I., Loia, V., Martin-Bautista, M.J. et al. Automatic constraints generation for semisupervised clustering: experiences with documents classification. Soft Comput 20, 2329–2339 (2016). https://doi.org/10.1007/s00500-015-1643-3

Download citation

Published: 17 March 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00500-015-1643-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic constraints generation for semisupervised clustering: experiences with documents classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Study of the Convergence in Automatic Generation of Instance Level Constraints

Semi-supervised Learning of Database Annotated Data Clustering Method

A Survey of Constrained Clustering

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now