Abstract
Traditional graph-based multi-view learning methods usually assume that data are complete. Whereas several instances of some views may be missing, making the corresponding graphs incomplete and reducing the virtue of graph regularization. To mitigate the negative effect, a novel method, called incomplete multi-view learning via consensus graph completion (IMLCGC), is proposed in this paper, which completes the incomplete graphs based on the consensus among different views and then fuses the completed graphs into a common graph. Specifically, IMLCGC develops a learning framework for incomplete multi-view data, which contains three components, i.e., consensus low-dimensional representation, graph regularization, and consensus graph completion. Furthermore, a generalization error bound of the model is established based on Rademacher’s complexity. It shows the theory that learning with incomplete multi-view data is difficult. Experimental results on six well-known datasets indicate that IMLCGC significantly outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Zhao J, Xie XJ, Xu X, Sun SL (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54
Cai WL, Zhou HH, Xu L (2021) A multi-view co-training clustering algorithm based on global and local structure preserving. IEEE Access 9:29293–29302
Kumar A, Daumé H (2011) A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 393–400
Liu C, Yuen PC (2011) A boosted co-training algorithm for human action recognition. IEEE Trans Circuits Syst Video Technol 21(9):1203–1213
Yang XH, Liu WF, Liu W, Tao DC (2021) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33(6):2349–2368
Brbic M, Kopriva I (2018) Multi-view low-rank sparse subspace clustering. Pattern Recogn 73:247–258
Zhao Y, You X, Yu S, Xu C, Yuan W, Jing X-Y, Zhang T, Tao D (2018) Multi-view manifold learning with locality alignment. Pattern Recogn 78:154–166
Xie XJ, Sun SL (2019) General multi-view learning with maximum entropy discrimination. Neurocomputing 332:184–192
Liu XW, Dou Y, Yin JP, Wang L, Zhu E (2016) Multiple kernel k-means clustering with matrix-induced regularization. In: 30th Association-for-the-Advancement-of-Artificial-Intelligence (AAAI) conference on artificial intelligence, pp 1888–1894
Chao GQ, Sun SL (2016) Multi-kernel maximum entropy discrimination for multi-view learning. Intell Data Anal 20(3):481–493
Zhao W, Xu C, Guan ZY, Liu Y (2021) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825
Yan XQ, Hu SZ, Mao YQ, Ye YD, Yu H (2021) Deep multi-view learning methods: a review. Neurocomputing 448:106–129
Sun G, Cong Y, Zhang YL, Zhao GS, Fu Y (2021) Continual multiview task learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(1):139–150
Tan G, Wang Z, Shi Z (2021) Proportional-integral state estimator for quaternion-valued neural networks with time-varying delays. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3103979
Liu Y, Fan L, Zhang C, Zhou T, Xiao Z, Geng L, Shen D (2021) Incomplete multi-modal representation learning for Alzheimer’s disease diagnosis. Med Image Anal. https://doi.org/10.1016/j.media.2020.101953
Yang WQ, Shi YH, Gao Y, Wang L, Yang M (2018) Incomplete-data oriented multiview dimension reduction via sparse low-rank representation. IEEE Trans Neural Netw Learn Syst 29(12):6276–6291
Li SY, Jiang Y, Zhou ZH (2014) Partial multi-view clustering. In: 28th AAAI conference on artificial intelligence, pp. 1968–1974
Wen J, Sun HJ, Fei LK, Li JX, Zhang Z, Zhang B (2021) Consensus guided incomplete multi-view spectral clustering. Neural Netw 133:207–219
Li P, Chen SC (2020) Shared Gaussian process latent variable model for incomplete multiview clustering. IEEE Trans Cybern 50(1):61–73
Qiao LS, Zhang LM, Chen SC, Shen DG (2018) Data-driven graph construction and graph learning: a review. Neurocomputing 312:336–351
Feng X, Ke S, Shuo Y, Aziz A, Liangtian W, Shirui P, Huan L (2021) Graph learning: a survey. IEEE Trans Artif Intell 2(2):109–127
Wen J, Xu Y, Liu H (2020) Incomplete multiview spectral clustering with adaptive graph learning. IEEE Trans Cybern 50(4):1418–1429
Wen J, Zhang Z, Zhang Z, Fei LK, Wang M (2021) Generalized incomplete multiview clustering with flexible locality structure diffusion. IEEE Trans Cybern 51(1):101–114
Zhang N, Sun S (2022) Incomplete multiview nonnegative representation learning with multiple graphs. Pattern Recogn 123:108412
Wen J, Yan K, Zhang Z, Xu Y, Wang JQ, Fei LK, Zhang B (2021) Adaptive graph completion based incomplete multi-view clustering. IEEE Trans Multimedia 23:2493–2504
Chen J, Wang G, Giannakis GB (2019) Graph multiview canonical correlation analysis. IEEE Trans Signal Process 67(11):2826–2838
Shawe-Taylor J, Cristianini N (2005) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Zhang EH, Chen XH, Wang LP (2020) Consistent discriminant correlation analysis. Neural Process Lett 52(1):891–904
Wang C (2007) Variational Bayesian approach to canonical correlation analysis. IEEE Trans Neural Netw 18(3):905–910
Carroll JD (1968) Generalization of canonical correlation analysis to three or more sets of variables. In: Proceedings of the 76th annual convention of the american psychological association, vol 3, pp 227–228
Fu X, Huang KJ, Hong MY, Sidiropoulos ND, So AMC (2017) Scalable and flexible multiview max-var canonical correlation analysis. IEEE Trans Signal Process 65(16):4150–4165
Luo Y, Tao DC, Ramamohanarao K, Xu C, Wen YG (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124
Liu XW, Zhu XZ, Li MM, Wang L, Zhu E, Liu TL, Kloft M, Shen DG, Yin JP, Gao W (2020) Multiple kernel k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204
Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev 52(3):471–501
Fazel M, Hindi H, Boyd SP (2001) Aacc, Aacc, Aacc: a rank minimization heuristic with application to minimum order system approximation. In: American Control Conference (ACC). Proceedings of the American Control Conference. IEEE, New York, pp 4734–4739
Kim E, Lee M, Oh S. Elastic-net regularization of singular values for robust subspace learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 915–923
Candes EJ, Guo F (2002) New multiscale transforms, minimum total variation synthesis: applications to edge-preserving image reconstruction. Signal Process 82(11):1519–1543
Cai JF, Candes EJ, Shen ZW (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Wright TG, Trefethen LN (2001) Large-scale computation of pseudospectra using ARPACK and eigs. SIAM J Sci Comput 23(2):591–605
Maurer A (2006) The rademacher complexity of linear transformation classes. In: Lugosi G, Simon HU (eds) Learning theory, proceedings. Lecture notes in artificial intelligence, vol 40, pp 65–78
Liu TL, Tao DC, Xu D (2016) Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural Comput 28(10):2213–2249
Maurer A, Pontil M (2010) K-dimensional coding schemes in hilbert spaces. IEEE Trans Inf Theory 56(11):5839–5846
Zhao H, Liu H, Fu Y. Incomplete multi-modal visual data grouping. In: IJCAI, pp 2392–2398
Wen J, Zhang Z, Xu Y, Zhong ZF (2018) Incomplete multi-view clustering via graph regularized matrix factorization. In: Computer Vision—ECCV 2018 workshops, Pt Iv, vol 11132, pp 593–608
Candes EJ, Recht B (2008) Exact low-rank matrix completion via convex optimization. In: 46th annual allerton conference on communication, control, and computing, pp 806–827
Bartlett PL, Mendelson S (2003) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3(3):463–482
Acknowledgements
The author would like to thank the National Natural Science Foundation of China (Grants 11971231 and 1211530001) for its support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
1.1 A.1 Proof of Lemma 1
For convenience, let’s denote
Before giving proof of Lemma 1, we first give the Lemma 3.
Lemma 3
Let \(Y\in \partial g\left( D \right) \) and \(Y'\in \partial g\left( D' \right) \), then the following inequality is hold
Proof
According to \(Y\in \partial g\left( D \right) \), we have \(Y=\mu {{Y}_{0}}+2\gamma D+\lambda F\), and similarly \({Y}'=\mu {Y'_0}+2\gamma {D}'+\lambda F\), where \(Y_0\in \partial \left\| D \right\| _*\). There is
Let’s prove \(\left\langle {{Y}_{0}}-{Y'_0},D-{D}' \right\rangle >0\). From the definition of \(Y_0\), we can get \({{\left\| {{Y}_{0}} \right\| }_{2}}\le 1\) and \(\left\langle {{Y}_{0}},D \right\rangle ={{\left\| D \right\| }_{*}}\), then [34, 45]
therefore
So, Eq. (A1) holds. \(\square \)
The proof of the Lemma 1 is given as follows.
Proof
Let \(D^*\) and \(\Lambda ^*\) are the primal and dual optimal solution of the problem (11). Then the optimality condition is
where \(Y^*\in \partial g\left( D^* \right) \), \(Y^k\in \partial g\left( D^k \right) \) for any k. Then we can get
and further according to Lemma 3 to obtain
From the \({{P}_{\Omega }}\left( {{D}^{*}} \right) ={{P}_{\Omega }}\left( M \right) \) we have
Let \({{r}_{k}}={{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k}}-{{\Lambda }^{*}} \right) \right\| }_{F}}\), then the following formula holds according to Eqs. (A2) and (A3),
Because \(0<\rho <4\gamma \), so \(4\gamma \rho -{{\rho }^{2}}>0\), which further has the following two properties:
-
1.
The sequence \(\left\{ {{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k}}-{{\Lambda }^{*}} \right) \right\| }_{F}}\right\} \) is nonincreasing, and is convergent due to it has a lower bound.
-
2.
Therefore, \(\left\| {{P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2}\rightarrow 0\) with \(k\rightarrow \infty \).
\(\square \)
1.2 A.2 Proof of Theorem 1
According to the above analysis, the optimal solution can be obtained for both of two subproblems. Therefore, Algorithm 1 makes the objective function value of Eq. (7) decrease monotonically, and because it has a lower bound, the convergence is guaranteed. Assume that Algorithm 1 converges to \(A^*\), \(\{W_i^*\}_{i=1}^m\), and \(D^*\), and then prove that it is KKT point.
Proof
The Lagrangian function of problem (7) is
where \(\Gamma \) is Lagrange multiplier. Taking the derivative w.r.t. A, \(\{W_i\}_{i=1}^m\), D, \(\Gamma \), and \(\Lambda \) respectively and setting them to zero, we can get the KKT condition of the problem (7):
According to the solving step of A and \(\{W_i\}_{i=1}^m\), \(A^*\) and \(\{W_i^*\}_{i=1}^m\) satisfy the following equation,
Further, \(A^*\) is obtained by solving the eigenvalue problem of Eq. (10), so Eqs. (A5a), (A5b) and (A5d) is established. Since the essence of SVT is to solve D through Eq. (A5c) , therefore Eq. (A5c) obviously is satisfied. We know from Lemma 1 that \(\left\| {\mathcal {P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2}\rightarrow 0\) with \(k\rightarrow \infty \), so Eq. (A5e) is satisfied. In summary, Algorithm 1 will converge to KKT point of problem (7). \(\square \)
1.3 A.3 Proof of Lemma 2
Proof
From the definition of \(p_i\left( x\right) \) and algebraic operation, we get formulation as follows,
Let
where \(w_i=\left[ a^T,-2a^TW_i^T,\text {vec}\left( W_iW_i^T\right) ^T \right] ^T\) and \(\varphi \left( x_i\right) =\left[ a^T,x_i^T,\text {vec}\left( x_ix_i^T\right) ^T \right] ^T\) for \(i=1, 2, \dots , m\). Therefore we can rewrite Eq. (18) as follows,
So it is easy to see that f(x) is a linear function for \(\Phi _{p} \left( x \right) \). It is shown below that the feature space \(\mathcal {F}\) is derived from \(\hat{k}_p\left( x,y\right) \),
\(\square \)
1.4 A.4 Proof of Theorem 2
Proof
Let’s first derive the upper bound on \(\left\| w\right\| _2^2\) as follows,
So \(\left\| w\right\| _2<B\). Based on Lemma 2 and assumptions, it is easy to find that \(f\left( x\right) \) belongs to the function class
Obviously, \(f\left( x\right) \ge 0\), and we are easy going to show that \(f\left( x\right) \) has an upper bound
As a result, we did this by exploiting the McDiarmid’s concentration inequality [27, 46], the follows inequation holds with probability at lest \(1-\delta \) that
where \(\hat{R}_n\left( \mathcal {F}_{B}\right) \) is empirical Rademacher’s complexity of \(\mathcal {F}_{B}\). Now let’s estimate the upper bound of \(\hat{R}_n\left( \mathcal {F}_{B}\right) \). According to the definition of \(\hat{R}_n\left( \mathcal {F}_{B}\right) \), we have following inequation holds,
where \(\left\{ \sigma _k\right\} _{k=1}^n\) are i.i.d. Rademacher random variable. The first inequality sign makes use of Cauchy–Schwarz inequality. The last inequality holds due to square root function is concave and can be derived from Jensen’s inequality. As a result,
Equation (19) can be proved by combining Eqs. (A8), (A9) and (A10). \(\square \)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, H., Chen, X., Zhang, E. et al. Incomplete Multi-view Learning via Consensus Graph Completion. Neural Process Lett 55, 3923–3952 (2023). https://doi.org/10.1007/s11063-022-10973-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10973-9