Linear semi-supervised projection clustering by transferred centroid regularization

Tong, Bin; Shao, Hao; Chou, Bin-Hui; Suzuki, Einoshin

doi:10.1007/s10844-012-0198-3

Linear semi-supervised projection clustering by transferred centroid regularization

Published: 10 March 2012

Volume 39, pages 461–490, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Bin Tong¹,
Hao Shao¹,
Bin-Hui Chou² &
…
Einoshin Suzuki²

320 Accesses
4 Citations
Explore all metrics

Abstract

We propose a novel method, called Semi-supervised Projection Clustering in Transfer Learning (SPCTL), where multiple source domains and one target domain are assumed. Traditional semi-supervised projection clustering methods hold the assumption that the data and pairwise constraints are all drawn from the same domain. However, many related data sets with different distributions are available in real applications. The traditional methods thus can not be directly extended to such a scenario. One major challenging issue is how to exploit constraint knowledge from multiple source domains and transfer it to the target domain where all the data are unlabeled. To handle this difficulty, we are motivated to construct a common subspace where the difference in distributions among domains can be reduced. We also invent a transferred centroid regularization, which acts as a bridge to transfer the constraint knowledge to the target domain, to formulate this geometric structure formed by the centroids from different domains. Extensive experiments on both synthetic and benchmark data sets show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heterogeneous domain adaptation by class centroid matching and local discriminative structure preservation

Article 22 April 2024

Multiple Projections Learning for Dimensional Reduction

Discriminative transfer learning via local and global structure preservation

Article 19 December 2018

Notes

References

Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research (JMLR), 6, 937–965
MathSciNet MATH Google Scholar
Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 59–68).
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research (JMLR), 7, 2399–2434
MathSciNet MATH Google Scholar
Bhattacharya, I., Godbole, S., Joshi, S., & Verma, A. (2009). Cross-guided clustering: Transfer of relevant supervision across domains for improved clustering. In IEEE International Conference on Data Mining (ICDM) (pp. 41–50).
Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical Methods on Natural Language Processing (EMNLP) (pp. 120–128).
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Chattopadhyay, R., Ye, J., Panchanathan S., Fan, W., & Davidson, I. (2011). Multi-source domain adaptation and its application to early detection of fatigue. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 717–725).
Chen, B., Lam, W., Tsang, I., & Wong, T. L. (2009). Extracting discriminative concepts for domain adaptation in text mining. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 179–188).
Dai, W., Yang, Q., Xue, G.R., & Yu, Y. (2008). Self-taught clustering. In International Conference on Machine Learning (ICML) (pp. 200–207).
Ding, C., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. In SIAM International Conference on Data Mining (SDM) (pp. 606–610).
Ding, C., & Li, T. (2007). Adaptive dimension reduction using discriminant analysis and k-means clustering. In International Conference on Machine Learning (ICML) (pp. 84–405).
Ding, C., Li, T., & Jordan, M. I. (2010). Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32, 45–55.
Article Google Scholar
Greene, D., & Cunningham, P. (2007). Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 140–151).
Gretton, A., Bousquet, O., Smola, A. J., & Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic Learning Theory (ALT) (pp. 63–77).
Gu, Q., & Zhou, J. (2009). Learning the shared subspace for multi-task clustering and transductive transfer classification. In IEEE International Conference on Data Mining (ICDM) (pp. 159–168).
Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines (pp. 282–317).
Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In International Conference on Machine Learning (ICML) (pp. 307–314).
Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A Kernel approach. In International Conference on Machine Learning (ICML) (pp. 457–464).
Lee, D. D., & Seung, H. S. (2001) Algorithms for non-negative matrix factorization. In Advanced Neural Information Processing Systems (NIPS) (pp. 556–562).
Ling, X., Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2008). Spectral domain-transfer learning. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 488–496).
Pan, S. J., Kwok, J. T., & Yang, Q. (2008). Transfer learning via dimensionality reduction. In Conference on Artificial Intelligence (AAAI) (pp. 677–682).
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2009). Domain adaptation via transfer component analysis. In International Joint Conferences on Artificial Intelligence (IJCAI) (pp. 1187–1192).
Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.
Article Google Scholar
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering (TKDE), 99, 1345–1359
Article Google Scholar
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Exploration Newsletter, 6(1), 90–105.
Article Google Scholar
Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. In ACM Special Interest Group on Information Retrieval (SIGIR) (pp. 208–215).
Tang, W., Xiong, H., Zhong, S., & Wu, J. (2007). Enhancing semi-supervised clustering: A feature projection perspective. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 707–716).
Tong, B., Shao, H., Chou, B-H., & Suzuki, E. (2010). Semi-supervised projection clustering with transferred centroid regularization. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 306–321).
Wagstaff, K., & Cardie, C. (2000). Clustering with instance-level constraints. In International Conference on Machine Learning (ICML) (pp. 1103–1110)
Wang, F., Li, T., & Zhang, C. (2008). Semi-supervised clustering via matrix factorization. In SIAM International Conference on Data Mining (SDM) (pp. 1–12)
Ye, J., Zhao, Z., & Liu, H. (2007). Adaptive distance metric learning for clustering. In Computer Vision and Pattern Recognition (CVPR) (pp. 1–7)
Ye, J., Zhao, Z., & Wu, M. (2007). Discriminative K-means for clustering. In Advanced Neural Information Processing Systems (NIPS) (pp. 1649–1656)
Zhang, D., Zhou, Z., & Chen, S. (2007). Semi-supervised dimensionality reduction. In SIAM International Conference on Data Mining (SDM) (pp. 629–624).
Zhong, E., Fan, W., Peng, J., Zhang, J. K., Ren, J., Turaga, D., et al. (2009). Cross domain distribution adaptation via Kernel mapping. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 1027–1036)

Download references

Author information

Authors and Affiliations

Graduate School of Systems Life Sciences, Kyushu University, Fukuoka, Japan
Bin Tong & Hao Shao
Department of Informatics, ISEE, Kyushu University, Fukuoka, Japan
Bin-Hui Chou & Einoshin Suzuki

Authors

Bin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hao Shao
View author publications
You can also search for this author in PubMed Google Scholar
Bin-Hui Chou
View author publications
You can also search for this author in PubMed Google Scholar
Einoshin Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Tong.

Additional information

This work is partially supported by the grant-in-aid for scientific research on fundamental research (B) 21300053 from the Japanese Ministry of Education, Culture, Sports, Science and Technology.

Appendix

Definition 1

(Lee and Seung 2001) Z(h,h′) is an auxiliary function for F(h) if the conditions

$$ \begin{array}{rll} Z(h,{h^{\prime}}) &\ge& F(h) \\ Z(h,h) &=& F(h) \end{array} $$

are satisfied.

Lemma 1

(Lee and Seung 2001) If Z is an auxiliary function for F , then F is non-increasing under the update

$$ {h^{(t + 1)}} = \arg \mathop { \min }\limits_h Z(h,{h^{(t)}}) $$

Proof

By construction, we have F(h ^(t + 1)) ≤ Z(h ^(t + 1),h ^t) ≤ Z(h ^t,h ^t) = F(h ^(t)). Thus, F(h ^(t)) is monotone decreasing.□

We write objective function (19) as

$$ \begin{array}{rll} F_{\mathbf{H}} &=& \textrm{tr}(\mathbf{H}^T \mathbf{A}\mathbf{H}) \\ &\rm{s.t.}& \mathbf{H} \ge 0 \end{array} $$

(28)

where H = L _s. From Lemma 1, in order to prove objective function (28) to be a non-increasing function, we need to construct an appropriate Z(h ^(t + 1),h ^t) and derive its global minimum.

Lemma 2

Given the objective function (28) where all elements in H are nonnegative, the following function

$$ Z(\mathbf{H}, \mathbf{H}^{\prime}) = \sum\limits_{ik} {\frac{{{{({\mathbf{A}}^{+}{{\mathbf{H}}^{\prime}})}_{ik}}{\mathbf{H}}_{ik}^2}}{{{\mathbf{H}}^{\prime}_{ik}}}} - \sum\limits_{ikl} {{{\mathbf{A}}^{-}_{il} }\mathbf{H}^{\prime}_{lk} \mathbf{H}^{\prime}_{ik} \left(1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} }\right)} $$

(29)

is an auxiliary function for F _H . Furthermore, it is a convex function in H and its global minimum is

$$ \mathbf{H}_{ij} = \mathbf{H}^{\prime}_{ij} \sqrt {\frac{{\left[{\bf{A}}^{-}\mathbf{H}^{\prime}\right]_{ij} }}{{\left[{\bf{A}}^{+}\mathbf{H}^{\prime} \right]_{ij} }}} $$

(30)

Proof

We rewrite (28) as

$$ F_\mathbf{H} = \textrm{tr}(\mathbf{H}^T \mathbf{A}^{+}\mathbf{H} - \mathbf{H}^T \mathbf{A}^{-}\mathbf{H}) $$

We find an upper bound for the first term and an lower bound for the second term. Using Lemma 3 (see below) and setting D ←I, C ←A ⁺, we obtain an upper bound

$$ \textrm{tr}(\mathbf{H}^T \mathbf{A}^{+}\mathbf{H}) \le \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^p {\frac{{{{({\mathbf{A}}^{+}{{\mathbf{H}}^{\prime}})}_{ik}}{\mathbf{H}}_{ik}^2}}{{{\mathbf{H}}^{\prime}_{ik}}}} } $$

To obtain the lower bound for the second term, we use inequality z ≥ 1 + logz, which holds for any z > 0, and derive

$$ \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{\mathbf{H}^{\prime}_{lk}} {\mathbf{H}^{\prime}_{ik}}} \ge 1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} } $$

(31)

From (31), the second term is bounded by

$$ \textrm{tr}(\mathbf{H}^T \mathbf{A}^{-}\mathbf{H}) \ge \sum\limits_{ikl} {{{\mathbf{A}}^{-}_{il} }\mathbf{H}^{\prime}_{lk} \mathbf{H}^{\prime}_{ik} \left(1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} }\right)} $$

(32)

Collecting two bounds, we obtain Z(H, H′) as shown in (29). It is obvious that F _H ≤ Z(H, H′) and F _H ≤ Z(H, H) . To find the minimum of Z(H, H′), we take

$$ \frac{{\partial Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}} = \frac{2(\mathbf{A}^{+}\mathbf{H}^{\prime})_{ik} \mathbf{H}_{ik}}{\mathbf{H}^{\prime}_{ik}} - \frac{2(\mathbf{A}^{-}\mathbf{H}^{\prime})_{ik} \mathbf{H}^{\prime}_{ik}}{\mathbf{H}_{ik}} $$

(33)

The Hessian matrix of Z(H, H′), which contains the second derivatives,

$$ \frac{{\partial^2 Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}{\partial \mathbf{H}_{jl}}} = \delta_{ij} \delta_{kl}\left( \frac{2(\mathbf{A}^{-} \mathbf{H}^{\prime})_{ik} \mathbf{H}^{\prime}_{ik}}{\mathbf{H}^{2}_{ik}} + \frac{2(\mathbf{A}^{+} \mathbf{H}^{\prime})_{ik} }{\mathbf{H}^{\prime}_{ik}} \right) $$

(34)

is a diagonal matrix with positive entries. Therefore, Z(H, H′) is a convex function of H. We then obtain the global minimum by setting $\frac{{\partial Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}} = 0$ and solving for H, from which we can get (23).□

Lemma 3

(Ding et al. 2010) For any nonnegative matrix C ∈ ℝ^{n ×n} , D ∈ ℝ^{p ×p} , S ∈ ℝ^{n ×p} , S′ ∈ ℝ^{n ×k} , and C , D are symmetric, the following inequality holds.

$$ \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^p {\frac{{{{({\mathbf{C}}{{\mathbf{S}}^{\prime}}{\mathbf{D}})}_{ik}}{\mathbf{S}}_{ik}^2}}{{{\mathbf{S}}^{\prime}_{ik}}}} } \ge tr({{\mathbf{S}}^T}{\mathbf{CSD}}) $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, B., Shao, H., Chou, BH. et al. Linear semi-supervised projection clustering by transferred centroid regularization. J Intell Inf Syst 39, 461–490 (2012). https://doi.org/10.1007/s10844-012-0198-3

Download citation

Received: 27 July 2011
Revised: 15 February 2012
Accepted: 15 February 2012
Published: 10 March 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s10844-012-0198-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear semi-supervised projection clustering by transferred centroid regularization

Abstract

Access this article

Similar content being viewed by others

Heterogeneous domain adaptation by class centroid matching and local discriminative structure preservation

Multiple Projections Learning for Dimensional Reduction

Discriminative transfer learning via local and global structure preservation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Definition 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Linear semi-supervised projection clustering by transferred centroid regularization

Abstract

Access this article

Similar content being viewed by others

Heterogeneous domain adaptation by class centroid matching and local discriminative structure preservation

Multiple Projections Learning for Dimensional Reduction

Discriminative transfer learning via local and global structure preservation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Definition 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation