Skip to main content

Active Learning Method for Constraint-Based Clustering Algorithms

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9659))

Included in the following conference series:

Abstract

Semi-supervision clustering aims to improve clustering performance with the help of user-provided side information. The pairwise constraints have become one of the most studied types of side information. According to the previous studies, such constraints increase clustering performance, but the choice of constraints is critical. If the constraints are selected improperly, they may even degrade the clustering performance. In order to solve this problem, researchers proposed some learning methods to actively select most informative pairwise constraints. In this paper, we presents a new active learning method for selecting informative data set, which significantly improves both the Explore phase and the Consolidate phase of the Min-Max algorithm. Experimental results on the data set of UCI Machine Learning Repository, using MPCK-means as the underlying constraint-based semi-supervised clustering algorithm, show that the proposed algorithm has better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wagstaff, K., Cardie, C., Rogers, S., et al.: Constrained K-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)

    Google Scholar 

  2. Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)

    Google Scholar 

  3. Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2013)

    Article  Google Scholar 

  4. Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of 4th SIAM International Conference on Data Mining (SDM-2004), pp. 333–344 (2004)

    Google Scholar 

  5. Li, Z., Liu, J., Tang, X.: Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: International Conference on Machine Learning (2008)

    Google Scholar 

  6. Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 140–151. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Huang, R., Lam, W.: Semi-supervised document clustering via active learning with pairwise constraints. In: ICDM IEEE Computer Society, pp. 517–522 (2007)

    Google Scholar 

  8. Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)

    Google Scholar 

  9. Xu, Q., desJardins, M., Wagstaff, K.L.: Active constrained clustering by examining spectral eigenvectors. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 294–307. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. J. Am. Stat. Assoc. 86, 830–833 (1990)

    MATH  Google Scholar 

  11. Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of the Tenth European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 115–126 (2006)

    Google Scholar 

  12. Guo, Y., Schuurmans, D.: Discriminative batch mode active learning. In: Advances in Neural Information Processing Systems, pp. 593–600 (2007)

    Google Scholar 

  13. Hoi, S.C.H., Jin, R., Zhu, J., et al.: Semi-supervised SVM batch mode active learning for image retrieval. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)

    Google Scholar 

  14. Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine (1998). http://www.ics.uci.edu/mlearn/MLRepository.html

  15. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  16. Nigsch, F., Bender, A., Van, B.B., et al.: Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J. Chem. Inf. Model. 46(6), 2412–2422 (2006)

    Article  Google Scholar 

  17. Dhurandhar, A., Dobra, A.: Probabilistic characterization of nearest neighbor classifier. Int. J. Mach. Learn. Cybernet. 4(4), 259–272 (2013)

    Article  Google Scholar 

  18. Lewis, D.D., Catlett, J., Cohen, W., et al.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings, pp. 148–156 (1994)

    Google Scholar 

  19. Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Mangasarian, O.L., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  21. Little, M.A., Mcsharry, P.E., Roberts, S.J., et al.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 6, 23 (2007)

    Article  Google Scholar 

  22. Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search (AAAI 2000), pp. 58–64 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tinghao Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cai, L., Yu, T., He, T., Chen, L., Lin, M. (2016). Active Learning Method for Constraint-Based Clustering Algorithms. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39958-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39957-7

  • Online ISBN: 978-3-319-39958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics