Abstract
Semi-supervision clustering aims to improve clustering performance with the help of user-provided side information. The pairwise constraints have become one of the most studied types of side information. According to the previous studies, such constraints increase clustering performance, but the choice of constraints is critical. If the constraints are selected improperly, they may even degrade the clustering performance. In order to solve this problem, researchers proposed some learning methods to actively select most informative pairwise constraints. In this paper, we presents a new active learning method for selecting informative data set, which significantly improves both the Explore phase and the Consolidate phase of the Min-Max algorithm. Experimental results on the data set of UCI Machine Learning Repository, using MPCK-means as the underlying constraint-based semi-supervised clustering algorithm, show that the proposed algorithm has better performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wagstaff, K., Cardie, C., Rogers, S., et al.: Constrained K-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1103–1110 (2000)
Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2013)
Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of 4th SIAM International Conference on Data Mining (SDM-2004), pp. 333–344 (2004)
Li, Z., Liu, J., Tang, X.: Pairwise constraint propagation by semidefinite programming for semi-supervised classification. In: International Conference on Machine Learning (2008)
Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 140–151. Springer, Heidelberg (2007)
Huang, R., Lam, W.: Semi-supervised document clustering via active learning with pairwise constraints. In: ICDM IEEE Computer Society, pp. 517–522 (2007)
Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Xu, Q., desJardins, M., Wagstaff, K.L.: Active constrained clustering by examining spectral eigenvectors. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 294–307. Springer, Heidelberg (2005)
Kaufman, L., Rousseeuw, P.J.: Finding groups in data. An introduction to cluster analysis. J. Am. Stat. Assoc. 86, 830–833 (1990)
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of the Tenth European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 115–126 (2006)
Guo, Y., Schuurmans, D.: Discriminative batch mode active learning. In: Advances in Neural Information Processing Systems, pp. 593–600 (2007)
Hoi, S.C.H., Jin, R., Zhu, J., et al.: Semi-supervised SVM batch mode active learning for image retrieval. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7. IEEE (2008)
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases, Department of Information and Computer Science, University of California, Irvine (1998). http://www.ics.uci.edu/mlearn/MLRepository.html
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Nigsch, F., Bender, A., Van, B.B., et al.: Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization. J. Chem. Inf. Model. 46(6), 2412–2422 (2006)
Dhurandhar, A., Dobra, A.: Probabilistic characterization of nearest neighbor classifier. Int. J. Mach. Learn. Cybernet. 4(4), 259–272 (2013)
Lewis, D.D., Catlett, J., Cohen, W., et al.: Heterogeneous uncertainty sampling for supervised learning. In: Machine Learning Proceedings, pp. 148–156 (1994)
Davidson, I., Wagstaff, K.L., Basu, S.: Measuring constraint-set utility for partitional clustering algorithms. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 115–126. Springer, Heidelberg (2006)
Mangasarian, O.L., Wolberg, W.H.: Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43(4), 570–577 (1970)
Little, M.A., Mcsharry, P.E., Roberts, S.J., et al.: Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed. Eng. Online 6, 23 (2007)
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page clustering. In: Workshop on Artificial Intelligence for Web Search (AAAI 2000), pp. 58–64 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Cai, L., Yu, T., He, T., Chen, L., Lin, M. (2016). Active Learning Method for Constraint-Based Clustering Algorithms. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-39958-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39957-7
Online ISBN: 978-3-319-39958-4
eBook Packages: Computer ScienceComputer Science (R0)