Skip to main content
Log in

Linear semi-supervised projection clustering by transferred centroid regularization

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

We propose a novel method, called Semi-supervised Projection Clustering in Transfer Learning (SPCTL), where multiple source domains and one target domain are assumed. Traditional semi-supervised projection clustering methods hold the assumption that the data and pairwise constraints are all drawn from the same domain. However, many related data sets with different distributions are available in real applications. The traditional methods thus can not be directly extended to such a scenario. One major challenging issue is how to exploit constraint knowledge from multiple source domains and transfer it to the target domain where all the data are unlabeled. To handle this difficulty, we are motivated to construct a common subspace where the difference in distributions among domains can be reduced. We also invent a transferred centroid regularization, which acts as a bridge to transfer the constraint knowledge to the target domain, to formulate this geometric structure formed by the centroids from different domains. Extensive experiments on both synthetic and benchmark data sets show the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://people.csail.mit.edu/jrennie/20Newsgroups/

  2. http://www.cs.cmu.edu/afs/cs/project/theo-20/www/data/

References

  • Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2005). Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research (JMLR), 6, 937–965

    MathSciNet  MATH  Google Scholar 

  • Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 59–68).

  • Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research (JMLR), 7, 2399–2434

    MathSciNet  MATH  Google Scholar 

  • Bhattacharya, I., Godbole, S., Joshi, S., & Verma, A. (2009). Cross-guided clustering: Transfer of relevant supervision across domains for improved clustering. In IEEE International Conference on Data Mining (ICDM) (pp. 41–50).

  • Blitzer, J., McDonald, R., & Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Empirical Methods on Natural Language Processing (EMNLP) (pp. 120–128).

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.

  • Chattopadhyay, R., Ye, J., Panchanathan S., Fan, W., & Davidson, I. (2011). Multi-source domain adaptation and its application to early detection of fatigue. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 717–725).

  • Chen, B., Lam, W., Tsang, I., & Wong, T. L. (2009). Extracting discriminative concepts for domain adaptation in text mining. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 179–188).

  • Dai, W., Yang, Q., Xue, G.R., & Yu, Y. (2008). Self-taught clustering. In International Conference on Machine Learning (ICML) (pp. 200–207).

  • Ding, C., He, X., & Simon, H. D. (2005). On the equivalence of nonnegative matrix factorization and spectral clustering. In SIAM International Conference on Data Mining (SDM) (pp. 606–610).

  • Ding, C., & Li, T. (2007). Adaptive dimension reduction using discriminant analysis and k-means clustering. In International Conference on Machine Learning (ICML) (pp. 84–405).

  • Ding, C., Li, T., & Jordan, M. I. (2010). Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32, 45–55.

    Article  Google Scholar 

  • Greene, D., & Cunningham, P. (2007). Constraint selection by committee: An ensemble approach to identifying informative constraints for semi-supervised clustering. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 140–151).

  • Gretton, A., Bousquet, O., Smola, A. J., & Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic Learning Theory (ALT) (pp. 63–77).

  • Gu, Q., & Zhou, J. (2009). Learning the shared subspace for multi-task clustering and transductive transfer classification. In IEEE International Conference on Data Mining (ICDM) (pp. 159–168).

  • Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines (pp. 282–317).

  • Klein, D., Kamvar, S. D., & Manning, C. D. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In International Conference on Machine Learning (ICML) (pp. 307–314).

  • Kulis, B., Basu, S., Dhillon, I., & Mooney, R. (2005). Semi-supervised graph clustering: A Kernel approach. In International Conference on Machine Learning (ICML) (pp. 457–464).

  • Lee, D. D., & Seung, H. S. (2001) Algorithms for non-negative matrix factorization. In Advanced Neural Information Processing Systems (NIPS) (pp. 556–562).

  • Ling, X., Dai, W., Xue, G. R., Yang, Q., & Yu, Y. (2008). Spectral domain-transfer learning. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 488–496).

  • Pan, S. J., Kwok, J. T., & Yang, Q. (2008). Transfer learning via dimensionality reduction. In Conference on Artificial Intelligence (AAAI) (pp. 677–682).

  • Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2009). Domain adaptation via transfer component analysis. In International Joint Conferences on Artificial Intelligence (IJCAI) (pp. 1187–1192).

  • Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.

    Article  Google Scholar 

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering (TKDE), 99, 1345–1359

    Article  Google Scholar 

  • Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Exploration Newsletter, 6(1), 90–105.

    Article  Google Scholar 

  • Slonim, N., & Tishby, N. (2000). Document clustering using word clusters via the information bottleneck method. In ACM Special Interest Group on Information Retrieval (SIGIR) (pp. 208–215).

  • Tang, W., Xiong, H., Zhong, S., & Wu, J. (2007). Enhancing semi-supervised clustering: A feature projection perspective. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 707–716).

  • Tong, B., Shao, H., Chou, B-H., & Suzuki, E. (2010). Semi-supervised projection clustering with transferred centroid regularization. In European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD) (pp. 306–321).

  • Wagstaff, K., & Cardie, C. (2000). Clustering with instance-level constraints. In International Conference on Machine Learning (ICML) (pp. 1103–1110)

  • Wang, F., Li, T., & Zhang, C. (2008). Semi-supervised clustering via matrix factorization. In SIAM International Conference on Data Mining (SDM) (pp. 1–12)

  • Ye, J., Zhao, Z., & Liu, H. (2007). Adaptive distance metric learning for clustering. In Computer Vision and Pattern Recognition (CVPR) (pp. 1–7)

  • Ye, J., Zhao, Z., & Wu, M. (2007). Discriminative K-means for clustering. In Advanced Neural Information Processing Systems (NIPS) (pp. 1649–1656)

  • Zhang, D., Zhou, Z., & Chen, S. (2007). Semi-supervised dimensionality reduction. In SIAM International Conference on Data Mining (SDM) (pp. 629–624).

  • Zhong, E., Fan, W., Peng, J., Zhang, J. K., Ren, J., Turaga, D., et al. (2009). Cross domain distribution adaptation via Kernel mapping. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) (pp. 1027–1036)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Tong.

Additional information

This work is partially supported by the grant-in-aid for scientific research on fundamental research (B) 21300053 from the Japanese Ministry of Education, Culture, Sports, Science and Technology.

Appendix

Appendix

Definition 1

(Lee and Seung 2001) Z(h,h′) is an auxiliary function for F(h) if the conditions

$$ \begin{array}{rll} Z(h,{h^{\prime}}) &\ge& F(h) \\ Z(h,h) &=& F(h) \end{array} $$

are satisfied.

Lemma 1

(Lee and Seung 2001) If Z is an auxiliary function for F , then F is non-increasing under the update

$$ {h^{(t + 1)}} = \arg \mathop { \min }\limits_h Z(h,{h^{(t)}}) $$

Proof

By construction, we have F(h (t + 1)) ≤ Z(h (t + 1),h t) ≤ Z(h t,h t) = F(h (t)). Thus, F(h (t)) is monotone decreasing.□

We write objective function (19) as

$$ \begin{array}{rll} F_{\mathbf{H}} &=& \textrm{tr}(\mathbf{H}^T \mathbf{A}\mathbf{H}) \\ &\rm{s.t.}& \mathbf{H} \ge 0 \end{array} $$
(28)

where H = L s . From Lemma 1, in order to prove objective function (28) to be a non-increasing function, we need to construct an appropriate Z(h (t + 1),h t) and derive its global minimum.

Lemma 2

Given the objective function (28) where all elements in H are nonnegative, the following function

$$ Z(\mathbf{H}, \mathbf{H}^{\prime}) = \sum\limits_{ik} {\frac{{{{({\mathbf{A}}^{+}{{\mathbf{H}}^{\prime}})}_{ik}}{\mathbf{H}}_{ik}^2}}{{{\mathbf{H}}^{\prime}_{ik}}}} - \sum\limits_{ikl} {{{\mathbf{A}}^{-}_{il} }\mathbf{H}^{\prime}_{lk} \mathbf{H}^{\prime}_{ik} \left(1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} }\right)} $$
(29)

is an auxiliary function for F H . Furthermore, it is a convex function in H and its global minimum is

$$ \mathbf{H}_{ij} = \mathbf{H}^{\prime}_{ij} \sqrt {\frac{{\left[{\bf{A}}^{-}\mathbf{H}^{\prime}\right]_{ij} }}{{\left[{\bf{A}}^{+}\mathbf{H}^{\prime} \right]_{ij} }}} $$
(30)

Proof

We rewrite (28) as

$$ F_\mathbf{H} = \textrm{tr}(\mathbf{H}^T \mathbf{A}^{+}\mathbf{H} - \mathbf{H}^T \mathbf{A}^{-}\mathbf{H}) $$

We find an upper bound for the first term and an lower bound for the second term. Using Lemma 3 (see below) and setting DI, CA  + , we obtain an upper bound

$$ \textrm{tr}(\mathbf{H}^T \mathbf{A}^{+}\mathbf{H}) \le \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^p {\frac{{{{({\mathbf{A}}^{+}{{\mathbf{H}}^{\prime}})}_{ik}}{\mathbf{H}}_{ik}^2}}{{{\mathbf{H}}^{\prime}_{ik}}}} } $$

To obtain the lower bound for the second term, we use inequality z ≥ 1 + logz, which holds for any z > 0, and derive

$$ \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{\mathbf{H}^{\prime}_{lk}} {\mathbf{H}^{\prime}_{ik}}} \ge 1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} } $$
(31)

From (31), the second term is bounded by

$$ \textrm{tr}(\mathbf{H}^T \mathbf{A}^{-}\mathbf{H}) \ge \sum\limits_{ikl} {{{\mathbf{A}}^{-}_{il} }\mathbf{H}^{\prime}_{lk} \mathbf{H}^{\prime}_{ik} \left(1 + \log \frac{{{\mathbf{H}_{lk}}} {{\mathbf{H}_{ik}}} }{{{\mathbf{H}^{\prime}_{lk}}} {\mathbf{H}^{\prime}_{ik}} }\right)} $$
(32)

Collecting two bounds, we obtain Z(H, H′) as shown in (29). It is obvious that F H  ≤ Z(H, H′) and F H  ≤ Z(H, H) . To find the minimum of Z(H, H′), we take

$$ \frac{{\partial Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}} = \frac{2(\mathbf{A}^{+}\mathbf{H}^{\prime})_{ik} \mathbf{H}_{ik}}{\mathbf{H}^{\prime}_{ik}} - \frac{2(\mathbf{A}^{-}\mathbf{H}^{\prime})_{ik} \mathbf{H}^{\prime}_{ik}}{\mathbf{H}_{ik}} $$
(33)

The Hessian matrix of Z(H, H′), which contains the second derivatives,

$$ \frac{{\partial^2 Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}{\partial \mathbf{H}_{jl}}} = \delta_{ij} \delta_{kl}\left( \frac{2(\mathbf{A}^{-} \mathbf{H}^{\prime})_{ik} \mathbf{H}^{\prime}_{ik}}{\mathbf{H}^{2}_{ik}} + \frac{2(\mathbf{A}^{+} \mathbf{H}^{\prime})_{ik} }{\mathbf{H}^{\prime}_{ik}} \right) $$
(34)

is a diagonal matrix with positive entries. Therefore, Z(H, H′) is a convex function of H. We then obtain the global minimum by setting \(\frac{{\partial Z(\mathbf{H}, \mathbf{H}^{\prime})}}{{\partial \mathbf{H}_{ik}}} = 0\) and solving for H, from which we can get (23).□

Lemma 3

(Ding et al. 2010) For any nonnegative matrix C ∈ ℝn ×n , D ∈ ℝp ×p , S ∈ ℝn ×p , S′ ∈ ℝn ×k , and C , D are symmetric, the following inequality holds.

$$ \sum\limits_{i = 1}^n {\sum\limits_{k = 1}^p {\frac{{{{({\mathbf{C}}{{\mathbf{S}}^{\prime}}{\mathbf{D}})}_{ik}}{\mathbf{S}}_{ik}^2}}{{{\mathbf{S}}^{\prime}_{ik}}}} } \ge tr({{\mathbf{S}}^T}{\mathbf{CSD}}) $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tong, B., Shao, H., Chou, BH. et al. Linear semi-supervised projection clustering by transferred centroid regularization. J Intell Inf Syst 39, 461–490 (2012). https://doi.org/10.1007/s10844-012-0198-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-012-0198-3

Keywords

Navigation