Skip to main content
Log in

A global optimization method for semi-supervised clustering

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, we adapt Tuy’s concave cutting plane method to the semi-supervised clustering. We also give properties of local optimal solutions of the semi-supervised clustering. Numerical examples show that this method can give a better solution than other semi-supervised clustering algorithms do.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research 6: 937–965

    MathSciNet  Google Scholar 

  • Basu S, Banerjee A, Mooney RJ (2003) Semi-supervised clustering by seeding. In: Sammut C, Hoffmann AG (eds) ICML: Machine learning, proceedings of the nineteenth international conference (ICML 2002). University of New South Wales, Sydney, Australia, Morgan Kaufmann, July 8–12, 2002, pp 27–34

  • Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) KDD: proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. Seattle, Washington, USA, ACM, August 22–25, pp 59–68

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: ICML’04: proceedings of the twenty-first international conference on machine learning. ACM Press, New York, NY, USA, p 11

  • Chang H, Yeung D-Y (2006) Locally linear metric adaptation with application to semi-supervised clustering and image retrieval. Pattern Recognition 39(7): 1253–1264

    Article  MATH  Google Scholar 

  • Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Technical report, Cornell University

  • Davidson I, Ravi SS (2005) Clustering with constraints: feasibility issues and the k-means algorithm. In: Proceedings of the 2005 SIAM international conference on data mining

  • Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. In: Proceedings of ANNIE’99 (Artificial Neural Networks in Engineering). R.P.I. Math Report No. 9901, Rensselaer Polytechnic Institute, Troy, New York

  • Drineas P, Frieze AM, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3): 9–33

    Article  MATH  Google Scholar 

  • Forrest JJ, Goldfarb D (1992) Steepest-edge simplex algorithms for linear programming. Math Programming 57(3, Ser. A): 341–374

    Article  MATH  MathSciNet  Google Scholar 

  • Forrest JJH, Tomlin JA (1992) Implementing the simplex method for the optimization subroutine library. IBM Syst J 31(1): 11–25

    Article  MATH  Google Scholar 

  • Freund RW, Jarre F (1997) A QMR-based interior-point algorithm for solving linear programs. Math Program 76(1, Ser. B):183–210. Interior point methods in theory and practice (Iowa City, IA, 1994)

  • Gao J, Tan P-N, Cheng H (2006) Semi-supervised clustering with partial background information. In: Ghosh J, Lambert D, Skillicorn DB, Srivastava J (eds) SDM’06: proceedings of the sixth SIAM international conference on data mining. SIAM, Bethesda, MD, USA, April 20–22

  • Gordon AD (1996) A survey of constrained classification. Comput Stat Data Anal 21(1): 17–29

    Article  MATH  Google Scholar 

  • Horst R, Tuy H (1993) Global optimization. Springer-Verlag, Berlin

    Google Scholar 

  • Jain AK, Mallapragada PK, Law M (2006) Bayesian feedback in data clustering. In: ICPR. IEEE Computer Society, pp 374–378

  • Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the international conference on machine learning

  • Lange T, Law MHC, Jain AK, Buhmann JM (2005) Learning with constrained and unlabelled data. In: CVPR’05: proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE Computer Society, Washington, DC, USA, pp 731–738

  • Murphy PM, Aha DW (1994) UCI repository of machine learning databases. Technical report, University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Nemhauser GL, Wolsey LA (1988) Integer and combinatorial optimization Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, A Wiley-Interscience Publication, New York

    Google Scholar 

  • Nesterov Y (2004) Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA (A basic course)

    Google Scholar 

  • Nesterov Y, Nemirovskii A (1994) Interior-point polynomial algorithms in convex programming, volume 13 of SIAM studies in applied mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA

    Google Scholar 

  • Shental N, Bar-Hillel A, Hertz T, Weinshall D (2003) Computing Gaussian mixture models with EM using equivalence constraints. In: Thrun S, Saul LK, Schölkopf B (eds) NIPS. MIT Press

  • Tuy H (1964) Concave programming under linear constraints. Soviet Math 5: 1437–1440

    Google Scholar 

  • Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 1103–1110

  • Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: ICML’01: proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 577–584

  • Xia Y (2007) Constrained clustering via concavity cuts. In: CPAIOR’07: proceedings of the fourth international conference on integration of AI and OR techniques in constraint programming for combinatorial optimization problems, pp 318–331. http://dx.doi.org/10.1007/978-3-540-72397-4_23, http://dblp.uni-trier.de

  • Xia Y, Peng J (2005) A cutting algorithm for the minimum sum-of-squared error clustering. In: Proceedings of the fifth SIAM international conference on data mining, pp 150–160

  • Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning with application to clustering with side-information. In: Thrun S, Becker S, Obermayer K (eds) Advances in neural information processing systems, vol 15. MIT Press, Cambridge, MA, pp 505–512

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Xia.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xia, Y. A global optimization method for semi-supervised clustering. Data Min Knowl Disc 18, 214–256 (2009). https://doi.org/10.1007/s10618-008-0104-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-008-0104-3

Keywords

Navigation