Abstract
We propose a novel method, called Semi-supervised Projection Clustering in Transfer Learning (SPCTL), where multiple source domains and one target domain are assumed. Traditional semi-supervised projection clustering methods hold the assumption that the data and pairwise constraints are all drawn from the same domain. However, many related data sets with different distributions are available in real applications. The traditional methods thus can not be directly extended to such a scenario. One major challenging issue is how to exploit constraint knowledge from multiple source domains and transfer it to the target domain where all the data are unlabeled. To handle this difficulty, we are motivated to construct a common subspace where the difference in distributions among domains can be reduced. We also invent a transferred centroid regularization, which acts as a bridge to transfer the constraint knowledge to the target domain, to formulate this geometric structure formed by the centroids from different domains. Extensive experiments on both synthetic and benchmark data sets show the effectiveness of our method.
Similar content being viewed by others
References
Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research (JMLR), 6, 937–965
Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 59–68).
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research (JMLR), 7, 2399–2434
Bhattacharya, I., Godbole, S., Joshi, S., & Verma, A. (2009). Cross-guided clustering: Transfer of relevant supervision across domains for improved clustering. In IEEE International Conference on Data Mining (ICDM) (pp. 41–50).
Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical Methods on Natural Language Processing (EMNLP) (pp. 120–128).
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Chattopadhyay, R., Ye, J., Panchanathan S., Fan, W., & Davidson, I. (2011). Multi-source domain adaptation and its application to early detection of fatigue. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 717–725).
Chen, B., Lam, W., Tsang, I., & Wong, T. L. (2009). Extracting discriminative concepts for domain adaptation in text mining. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 179–188).
Dai, W., Yang, Q., Xue, G.R., & Yu, Y. (2008). Self-taught clustering. In International Conference on Machine Learning (ICML) (pp. 200–207).
Ding, C., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. In SIAM International Conference on Data Mining (SDM) (pp. 606–610).
Ding, C., & Li, T. (2007). Adaptive dimension reduction using discriminant analysis and k-means clustering. In International Conference on Machine Learning (ICML) (pp. 84–405).
Ding, C., Li, T., & Jordan, M. I. (2010). Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32, 45–55.
Greene, D., & Cunningham, P. (2007). Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 140–151).
Gretton, A., Bousquet, O., Smola, A. J., & Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic Learning Theory (ALT) (pp. 63–77).
Gu, Q., & Zhou, J. (2009). Learning the shared subspace for multi-task clustering and transductive transfer classification. In IEEE International Conference on Data Mining (ICDM) (pp. 159–168).
Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines (pp. 282–317).
Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In International Conference on Machine Learning (ICML) (pp. 307–314).
Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A Kernel approach. In International Conference on Machine Learning (ICML) (pp. 457–464).
Lee, D. D., & Seung, H. S. (2001) Algorithms for non-negative matrix factorization. In Advanced Neural Information Processing Systems (NIPS) (pp. 556–562).
Ling, X., Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2008). Spectral domain-transfer learning. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 488–496).
Pan, S. J., Kwok, J. T., & Yang, Q. (2008). Transfer learning via dimensionality reduction. In Conference on Artificial Intelligence (AAAI) (pp. 677–682).
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2009). Domain adaptation via transfer component analysis. In International Joint Conferences on Artificial Intelligence (IJCAI) (pp. 1187–1192).
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering (TKDE), 99, 1345–1359
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Exploration Newsletter, 6(1), 90–105.
Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. In ACM Special Interest Group on Information Retrieval (SIGIR) (pp. 208–215).
Tang, W., Xiong, H., Zhong, S., & Wu, J. (2007). Enhancing semi-supervised clustering: A feature projection perspective. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 707–716).
Tong, B., Shao, H., Chou, B-H., & Suzuki, E. (2010). Semi-supervised projection clustering with transferred centroid regularization. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 306–321).
Wagstaff, K., & Cardie, C. (2000). Clustering with instance-level constraints. In International Conference on Machine Learning (ICML) (pp. 1103–1110)
Wang, F., Li, T., & Zhang, C. (2008). Semi-supervised clustering via matrix factorization. In SIAM International Conference on Data Mining (SDM) (pp. 1–12)
Ye, J., Zhao, Z., & Liu, H. (2007). Adaptive distance metric learning for clustering. In Computer Vision and Pattern Recognition (CVPR) (pp. 1–7)
Ye, J., Zhao, Z., & Wu, M. (2007). Discriminative K-means for clustering. In Advanced Neural Information Processing Systems (NIPS) (pp. 1649–1656)
Zhang, D., Zhou, Z., & Chen, S. (2007). Semi-supervised dimensionality reduction. In SIAM International Conference on Data Mining (SDM) (pp. 629–624).
Zhong, E., Fan, W., Peng, J., Zhang, J. K., Ren, J., Turaga, D., et al. (2009). Cross domain distribution adaptation via Kernel mapping. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 1027–1036)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by the grant-in-aid for scientific research on fundamental research (B) 21300053 from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
Appendix
Appendix
Definition 1
(Lee and Seung 2001) Z(h,h′) is an auxiliary function for F(h) if the conditions
are satisfied.
Lemma 1
(Lee and Seung 2001) If Z is an auxiliary function for F , then F is non-increasing under the update
Proof
By construction, we have F(h (t + 1)) ≤ Z(h (t + 1),h t) ≤ Z(h t,h t) = F(h (t)). Thus, F(h (t)) is monotone decreasing.□
We write objective function (19) as
where H = L s . From Lemma 1, in order to prove objective function (28) to be a non-increasing function, we need to construct an appropriate Z(h (t + 1),h t) and derive its global minimum.
Lemma 2
Given the objective function (28) where all elements in H are nonnegative, the following function
is an auxiliary function for F H . Furthermore, it is a convex function in H and its global minimum is
Proof
We rewrite (28) as
We find an upper bound for the first term and an lower bound for the second term. Using Lemma 3 (see below) and setting D ←I, C ←A + , we obtain an upper bound
To obtain the lower bound for the second term, we use inequality z ≥ 1 + logz, which holds for any z > 0, and derive
From (31), the second term is bounded by
Collecting two bounds, we obtain Z(H, H′) as shown in (29). It is obvious that F H ≤ Z(H, H′) and F H ≤ Z(H, H) . To find the minimum of Z(H, H′), we take
The Hessian matrix of Z(H, H′), which contains the second derivatives,
is a diagonal matrix with positive entries. Therefore, Z(H, H′) is a convex function of H. We then obtain the global minimum by setting \(\frac{{\partial Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}} = 0\) and solving for H, from which we can get (23).□
Lemma 3
(Ding et al. 2010) For any nonnegative matrix C ∈ ℝn ×n , D ∈ ℝp ×p , S ∈ ℝn ×p , S′ ∈ ℝn ×k , and C , D are symmetric, the following inequality holds.
Rights and permissions
About this article
Cite this article
Tong, B., Shao, H., Chou, BH. et al. Linear semi-supervised projection clustering by transferred centroid regularization. J Intell Inf Syst 39, 461–490 (2012). https://doi.org/10.1007/s10844-012-0198-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-012-0198-3