Skip to main content

Semi Supervised Clustering: A Pareto Approach

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2012)

Abstract

In this paper we present a Pareto based multi objective algorithm for semi supervised clustering (PSC). Semi-supervised clustering uses a small amount of supervised data known as constraints, to assist unsupervised learning. Instead of modifying the clustering objective function, we add another objective function to satisfy specified constraints. We use a lexicographically ordered cluster assignment step to direct the search and a Pareto based multi objective evolutionary algorithm to maintain diversity in the population. Two objectives are considered: one that minimizes the intra cluster variance and another that minimizes the number of constraint violations. Experiments show the superiority of the method over a greedy algorithm (PCK-means) and a genetic algorithm (COP-HGA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Comput. Surv., 264–323 (1999)

    Google Scholar 

  2. Basu, S., Banerjee, A., Mooney, R.J.: Active Semi-Supervision for Pairwise Constrained Clustering. In: SDM (2004)

    Google Scholar 

  3. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: KDD, pp. 59–68 (2004)

    Google Scholar 

  4. Basu, S., Davidson, I., Wagstaff, K.L.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman and Hall/CRC (2008)

    Google Scholar 

  5. Aliguliyev, R.M.: Clustering of document collection - A weighting approach. Expert Syst. Appl, 7904–7916 (2009)

    Google Scholar 

  6. Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition, 1455–1465 (2000)

    Google Scholar 

  7. Das, S., Abraham, A., Konar, A.: Metaheuristic Clustering. SCI, vol. 178. Springer, Heidelberg (2009)

    MATH  Google Scholar 

  8. Cui, X., Palathingal, P., Potok, P.: Document Clustering using Particle Swarm Optimization. In: IEEE Swarm Intelligence Symposium 2005, Pasadena, California, pp. 185–191 (2005)

    Google Scholar 

  9. Handl, J., Meyer, B.: Ant-based and swarm-based clustering. Swarm Intelligence, 95–113 (2007)

    Google Scholar 

  10. Das, S., Konar, A.: Automatic image pixel clustering with an improved differential evolution. Appl. Soft Comput., 226–236 (2009)

    Google Scholar 

  11. Song, W., Choi, L.C., Park, S.C., Ding, X.F.: Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst. Appl., 9112–9121 (2011)

    Google Scholar 

  12. Hong, Y., Kwong, S., Xiong, H., Ren, Q.: Genetic-guided semi-supervised clustering algorithm with instance-level constraints. In: GECCO, pp. 1381–1388 (2008)

    Google Scholar 

  13. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press (2008)

    Google Scholar 

  14. Davies, D., Bouldin, D.: A cluster separationmeasure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  15. Coelho, G.P., França, F.O.D., Zuben, F.J.V.: Multi-Objective Biclustering: When Non-dominated Solutions are not Enough. J. Math. Model. Algorithms, 175–202 (2009)

    Google Scholar 

  16. Maitra, M., Chatterjee, A.: A hybrid cooperative-comprehensive learning based PSO algorithm for image segmentation using multilevel thresholding. Expert Syst. Appl., 1341–1350 (2008)

    Google Scholar 

  17. Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recognition, 2464–2477 (2006)

    Google Scholar 

  18. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained K-means Clustering with Background Knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  19. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML (2004)

    Google Scholar 

  20. Hong, Y., Kwong, S., Wang, H., Ren, Q., Chang, Y.: Probabilistic and Graphical Model based Genetic Algorithm Driven Clustering with Instance-level Constraints. In: IEEE Congress on Evolutionary Computation, pp. 322–329 (2008)

    Google Scholar 

  21. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  22. Coello, C.A.C.: A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques. Knowl. Inf. Syst., 129–156 (1999)

    Google Scholar 

  23. Sindhya, K., Deb, K., Miettinen, K.: A Local Search Based Evolutionary Multi-objective Optimization Approach for Fast and Accurate Convergence. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 815–824. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  24. Mahfoud, S.: Niching Methods for Genetic Algorithms. PhD thesis, University of Illinois at Urbana Champaign (1995)

    Google Scholar 

  25. Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Carvalho, A.C.P.L.F.D.: A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 133–155 (2009)

    Google Scholar 

  26. Pizzuti, C.: GA-Net: A Genetic Algorithm for Community Detection in Social Networks. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 1081–1090. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  27. Firat, A., Chatterjee, S., Yilmaz, M.: Genetic clustering of social networks using random walks. Computational Statistics & Data Analysis, 6285–6294 (2007)

    Google Scholar 

  28. Mitchell, T.M.: Machine learning. McGraw Hill series in computer science, pp. 1–414 (1997)

    Google Scholar 

  29. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 888–905 (2000)

    Google Scholar 

  30. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. In: NIPS, pp. 849–856 (2001)

    Google Scholar 

  31. UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html

  32. Repository of information on semi-supervised clustering, University of Texas at Austin, http://www.cs.utexas.edu/users/ml/risc/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ebrahimi, J., Saniee Abadeh, M. (2012). Semi Supervised Clustering: A Pareto Approach. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics