Skip to main content

Advertisement

Log in

Incomplete Multi-view Learning via Consensus Graph Completion

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Traditional graph-based multi-view learning methods usually assume that data are complete. Whereas several instances of some views may be missing, making the corresponding graphs incomplete and reducing the virtue of graph regularization. To mitigate the negative effect, a novel method, called incomplete multi-view learning via consensus graph completion (IMLCGC), is proposed in this paper, which completes the incomplete graphs based on the consensus among different views and then fuses the completed graphs into a common graph. Specifically, IMLCGC develops a learning framework for incomplete multi-view data, which contains three components, i.e., consensus low-dimensional representation, graph regularization, and consensus graph completion. Furthermore, a generalization error bound of the model is established based on Rademacher’s complexity. It shows the theory that learning with incomplete multi-view data is difficult. Experimental results on six well-known datasets indicate that IMLCGC significantly outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets/Multiple+Features.

  2. http://www.cs.cmu.edu/afs/cs/project/theo-11/www/wwkb/.

  3. http://lig-membres.imag.fr/grimal/data.html.

  4. http://lig-membres.imag.fr/grimal/data.html.

  5. http://cam-orl.co.uk/facedatabase.html.

  6. http://yann.lecun.com/exdb/mnist/.

References

  1. Zhao J, Xie XJ, Xu X, Sun SL (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54

    Article  Google Scholar 

  2. Cai WL, Zhou HH, Xu L (2021) A multi-view co-training clustering algorithm based on global and local structure preserving. IEEE Access 9:29293–29302

    Article  Google Scholar 

  3. Kumar A, Daumé H (2011) A co-training approach for multi-view spectral clustering. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 393–400

  4. Liu C, Yuen PC (2011) A boosted co-training algorithm for human action recognition. IEEE Trans Circuits Syst Video Technol 21(9):1203–1213

    Article  Google Scholar 

  5. Yang XH, Liu WF, Liu W, Tao DC (2021) A survey on canonical correlation analysis. IEEE Trans Knowl Data Eng 33(6):2349–2368

    Article  Google Scholar 

  6. Brbic M, Kopriva I (2018) Multi-view low-rank sparse subspace clustering. Pattern Recogn 73:247–258

    Article  Google Scholar 

  7. Zhao Y, You X, Yu S, Xu C, Yuan W, Jing X-Y, Zhang T, Tao D (2018) Multi-view manifold learning with locality alignment. Pattern Recogn 78:154–166

    Article  Google Scholar 

  8. Xie XJ, Sun SL (2019) General multi-view learning with maximum entropy discrimination. Neurocomputing 332:184–192

    Article  Google Scholar 

  9. Liu XW, Dou Y, Yin JP, Wang L, Zhu E (2016) Multiple kernel k-means clustering with matrix-induced regularization. In: 30th Association-for-the-Advancement-of-Artificial-Intelligence (AAAI) conference on artificial intelligence, pp 1888–1894

  10. Chao GQ, Sun SL (2016) Multi-kernel maximum entropy discrimination for multi-view learning. Intell Data Anal 20(3):481–493

    Article  Google Scholar 

  11. Zhao W, Xu C, Guan ZY, Liu Y (2021) Multiview concept learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(2):814–825

    Article  MathSciNet  Google Scholar 

  12. Yan XQ, Hu SZ, Mao YQ, Ye YD, Yu H (2021) Deep multi-view learning methods: a review. Neurocomputing 448:106–129

    Article  Google Scholar 

  13. Sun G, Cong Y, Zhang YL, Zhao GS, Fu Y (2021) Continual multiview task learning via deep matrix factorization. IEEE Trans Neural Netw Learn Syst 32(1):139–150

    Article  MathSciNet  Google Scholar 

  14. Tan G, Wang Z, Shi Z (2021) Proportional-integral state estimator for quaternion-valued neural networks with time-varying delays. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3103979

    Article  Google Scholar 

  15. Liu Y, Fan L, Zhang C, Zhou T, Xiao Z, Geng L, Shen D (2021) Incomplete multi-modal representation learning for Alzheimer’s disease diagnosis. Med Image Anal. https://doi.org/10.1016/j.media.2020.101953

    Article  Google Scholar 

  16. Yang WQ, Shi YH, Gao Y, Wang L, Yang M (2018) Incomplete-data oriented multiview dimension reduction via sparse low-rank representation. IEEE Trans Neural Netw Learn Syst 29(12):6276–6291

    Article  Google Scholar 

  17. Li SY, Jiang Y, Zhou ZH (2014) Partial multi-view clustering. In: 28th AAAI conference on artificial intelligence, pp. 1968–1974

  18. Wen J, Sun HJ, Fei LK, Li JX, Zhang Z, Zhang B (2021) Consensus guided incomplete multi-view spectral clustering. Neural Netw 133:207–219

    Article  MATH  Google Scholar 

  19. Li P, Chen SC (2020) Shared Gaussian process latent variable model for incomplete multiview clustering. IEEE Trans Cybern 50(1):61–73

    Article  Google Scholar 

  20. Qiao LS, Zhang LM, Chen SC, Shen DG (2018) Data-driven graph construction and graph learning: a review. Neurocomputing 312:336–351

    Article  Google Scholar 

  21. Feng X, Ke S, Shuo Y, Aziz A, Liangtian W, Shirui P, Huan L (2021) Graph learning: a survey. IEEE Trans Artif Intell 2(2):109–127

    Article  Google Scholar 

  22. Wen J, Xu Y, Liu H (2020) Incomplete multiview spectral clustering with adaptive graph learning. IEEE Trans Cybern 50(4):1418–1429

    Article  Google Scholar 

  23. Wen J, Zhang Z, Zhang Z, Fei LK, Wang M (2021) Generalized incomplete multiview clustering with flexible locality structure diffusion. IEEE Trans Cybern 51(1):101–114

    Article  Google Scholar 

  24. Zhang N, Sun S (2022) Incomplete multiview nonnegative representation learning with multiple graphs. Pattern Recogn 123:108412

    Article  Google Scholar 

  25. Wen J, Yan K, Zhang Z, Xu Y, Wang JQ, Fei LK, Zhang B (2021) Adaptive graph completion based incomplete multi-view clustering. IEEE Trans Multimedia 23:2493–2504

    Article  Google Scholar 

  26. Chen J, Wang G, Giannakis GB (2019) Graph multiview canonical correlation analysis. IEEE Trans Signal Process 67(11):2826–2838

    Article  MathSciNet  MATH  Google Scholar 

  27. Shawe-Taylor J, Cristianini N (2005) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  28. Zhang EH, Chen XH, Wang LP (2020) Consistent discriminant correlation analysis. Neural Process Lett 52(1):891–904

    Article  Google Scholar 

  29. Wang C (2007) Variational Bayesian approach to canonical correlation analysis. IEEE Trans Neural Netw 18(3):905–910

    Article  Google Scholar 

  30. Carroll JD (1968) Generalization of canonical correlation analysis to three or more sets of variables. In: Proceedings of the 76th annual convention of the american psychological association, vol 3, pp 227–228

  31. Fu X, Huang KJ, Hong MY, Sidiropoulos ND, So AMC (2017) Scalable and flexible multiview max-var canonical correlation analysis. IEEE Trans Signal Process 65(16):4150–4165

    Article  MathSciNet  MATH  Google Scholar 

  32. Luo Y, Tao DC, Ramamohanarao K, Xu C, Wen YG (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27(11):3111–3124

    Article  Google Scholar 

  33. Liu XW, Zhu XZ, Li MM, Wang L, Zhu E, Liu TL, Kloft M, Shen DG, Yin JP, Gao W (2020) Multiple kernel k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204

    Google Scholar 

  34. Recht B, Fazel M, Parrilo PA (2010) Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev 52(3):471–501

    Article  MathSciNet  MATH  Google Scholar 

  35. Fazel M, Hindi H, Boyd SP (2001) Aacc, Aacc, Aacc: a rank minimization heuristic with application to minimum order system approximation. In: American Control Conference (ACC). Proceedings of the American Control Conference. IEEE, New York, pp 4734–4739

  36. Kim E, Lee M, Oh S. Elastic-net regularization of singular values for robust subspace learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 915–923

  37. Candes EJ, Guo F (2002) New multiscale transforms, minimum total variation synthesis: applications to edge-preserving image reconstruction. Signal Process 82(11):1519–1543

    Article  MATH  Google Scholar 

  38. Cai JF, Candes EJ, Shen ZW (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982

    Article  MathSciNet  MATH  Google Scholar 

  39. Wright TG, Trefethen LN (2001) Large-scale computation of pseudospectra using ARPACK and eigs. SIAM J Sci Comput 23(2):591–605

    Article  MathSciNet  MATH  Google Scholar 

  40. Maurer A (2006) The rademacher complexity of linear transformation classes. In: Lugosi G, Simon HU (eds) Learning theory, proceedings. Lecture notes in artificial intelligence, vol 40, pp 65–78

  41. Liu TL, Tao DC, Xu D (2016) Dimensionality-dependent generalization bounds for k-dimensional coding schemes. Neural Comput 28(10):2213–2249

    Article  MathSciNet  MATH  Google Scholar 

  42. Maurer A, Pontil M (2010) K-dimensional coding schemes in hilbert spaces. IEEE Trans Inf Theory 56(11):5839–5846

    Article  MathSciNet  MATH  Google Scholar 

  43. Zhao H, Liu H, Fu Y. Incomplete multi-modal visual data grouping. In: IJCAI, pp 2392–2398

  44. Wen J, Zhang Z, Xu Y, Zhong ZF (2018) Incomplete multi-view clustering via graph regularized matrix factorization. In: Computer Vision—ECCV 2018 workshops, Pt Iv, vol 11132, pp 593–608

  45. Candes EJ, Recht B (2008) Exact low-rank matrix completion via convex optimization. In: 46th annual allerton conference on communication, control, and computing, pp 806–827

  46. Bartlett PL, Mendelson S (2003) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3(3):463–482

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author would like to thank the National Natural Science Foundation of China (Grants 11971231 and 1211530001) for its support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohong Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

1.1 A.1 Proof of Lemma 1

For convenience, let’s denote

$$\begin{aligned} g\left( D \right) =\mu {{\left\| D \right\| }_{*}}+\gamma \left\| D \right\| _{F}^{2}+\lambda tr\left( {{F}^{T}}D \right) . \end{aligned}$$

Before giving proof of Lemma 1, we first give the Lemma 3.

Lemma 3

Let \(Y\in \partial g\left( D \right) \) and \(Y'\in \partial g\left( D' \right) \), then the following inequality is hold

$$\begin{aligned} \left\langle Y-{Y}',D-{D}' \right\rangle \ge 2\gamma \left\| D-{D}' \right\| _{F}^{2}. \end{aligned}$$
(A1)

Proof

According to \(Y\in \partial g\left( D \right) \), we have \(Y=\mu {{Y}_{0}}+2\gamma D+\lambda F\), and similarly \({Y}'=\mu {Y'_0}+2\gamma {D}'+\lambda F\), where \(Y_0\in \partial \left\| D \right\| _*\). There is

$$\begin{aligned} \left\langle Y-Y',D-D' \right\rangle =\mu \left\langle {Y_0}-Y'_0,D-D' \right\rangle +2\gamma \left\| D-{D}' \right\| _{F}^{2}. \end{aligned}$$

Let’s prove \(\left\langle {{Y}_{0}}-{Y'_0},D-{D}' \right\rangle >0\). From the definition of \(Y_0\), we can get \({{\left\| {{Y}_{0}} \right\| }_{2}}\le 1\) and \(\left\langle {{Y}_{0}},D \right\rangle ={{\left\| D \right\| }_{*}}\), then [34, 45]

$$\begin{aligned} \left\langle {{Y}_{0}},{D}' \right\rangle \le {{\left\| {{Y}_{0}} \right\| }_{2}}{{\left\| {{D}'} \right\| }_{*}}\le {{\left\| {{D}'} \right\| }_{*}}, \end{aligned}$$

therefore

$$\begin{aligned} \left\langle {{Y}_{0}}-{Y'_0},D-{D}' \right\rangle ={{\left\| D \right\| }_{*}}\text {+}{{\left\| {{D}'} \right\| }_{*}}-\left\langle {Y'_0},D \right\rangle -\left\langle {{Y}_{0}},{D}' \right\rangle \ge 0. \end{aligned}$$

So, Eq. (A1) holds. \(\square \)

The proof of the Lemma 1 is given as follows.

Proof

Let \(D^*\) and \(\Lambda ^*\) are the primal and dual optimal solution of the problem (11). Then the optimality condition is

$$\begin{aligned} {{Y}^{k}}={{P}_{\Omega }}\left( {{\Lambda }^{k-1}} \right) ,\quad {{Y}^{*}}={{P}_{\Omega }}\left( {{\Lambda }^{*}} \right) , \end{aligned}$$

where \(Y^*\in \partial g\left( D^* \right) \), \(Y^k\in \partial g\left( D^k \right) \) for any k. Then we can get

$$\begin{aligned} {{Y}^{k}}-{{Y}^{*}}={{P}_{\Omega }}\left( {{\Lambda }^{k-1}}-{{\Lambda }^{*}} \right) \end{aligned}$$

and further according to Lemma 3 to obtain

$$\begin{aligned} \left\langle {{P}_{\Omega }}\left( {{\Lambda }^{k-1}}-{{\Lambda }^{*}} \right) ,{{D}^{k}}-{{D}^{*}} \right\rangle =\left\langle {{Y}^{k}}-{{Y}^{*}},{{D}^{k}}-{{D}^{*}} \right\rangle \ge 2\gamma \left\| {{D}^{k}}-{{D}^{*}} \right\| _{F}^{2}. \end{aligned}$$
(A2)

From the \({{P}_{\Omega }}\left( {{D}^{*}} \right) ={{P}_{\Omega }}\left( M \right) \) we have

$$\begin{aligned} {{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k}}-{{\Lambda }^{*}} \right) \right\| }_{F}}={{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k-1}}-{{\Lambda }^{*}} \right) +\rho {{P}_{\Omega }}\left( {{D}^{*}}-{{D}^{k}} \right) \right\| }_{F}}. \end{aligned}$$
(A3)

Let \({{r}_{k}}={{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k}}-{{\Lambda }^{*}} \right) \right\| }_{F}}\), then the following formula holds according to Eqs. (A2) and (A3),

$$\begin{aligned} \begin{aligned} r_{k}^{2}&=r_{k-1}^{2}-2\rho \left\langle {{P}_{\Omega }}\left( {{\Lambda }^{k-1}}-{{\Lambda }^{*}} \right) ,{{D}^{k}}-{{D}^{*}} \right\rangle +{{\rho }^{2}}\left\| {{P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2} \\&\le r_{k-1}^{2}-\left( 4\gamma \rho -{{\rho }^{2}} \right) \left\| {{P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2} . \end{aligned} \end{aligned}$$

Because \(0<\rho <4\gamma \), so \(4\gamma \rho -{{\rho }^{2}}>0\), which further has the following two properties:

  1. 1.

    The sequence \(\left\{ {{\left\| {{P}_{\Omega }}\left( {{\Lambda }^{k}}-{{\Lambda }^{*}} \right) \right\| }_{F}}\right\} \) is nonincreasing, and is convergent due to it has a lower bound.

  2. 2.

    Therefore, \(\left\| {{P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2}\rightarrow 0\) with \(k\rightarrow \infty \).

\(\square \)

1.2 A.2 Proof of Theorem 1

According to the above analysis, the optimal solution can be obtained for both of two subproblems. Therefore, Algorithm 1 makes the objective function value of Eq. (7) decrease monotonically, and because it has a lower bound, the convergence is guaranteed. Assume that Algorithm 1 converges to \(A^*\), \(\{W_i^*\}_{i=1}^m\), and \(D^*\), and then prove that it is KKT point.

Proof

The Lagrangian function of problem (7) is

$$\begin{aligned} \mathcal {L}_2\left( A,\{W_i\}_{i=1}^m,D,\Gamma ,\Lambda \right) =\sum \limits _{i=1}^{m}{\left\| \left( A-W_i^TX_i\right) P_i\right\| _F^2}+\left\langle \Gamma ,\left( AA^T-I\right) \right\rangle +\mathcal {L}_1\left( D,\Lambda \right) , \end{aligned}$$
(A4)

where \(\Gamma \) is Lagrange multiplier. Taking the derivative w.r.t. A, \(\{W_i\}_{i=1}^m\), D, \(\Gamma \), and \(\Lambda \) respectively and setting them to zero, we can get the KKT condition of the problem (7):

figure b

According to the solving step of A and \(\{W_i\}_{i=1}^m\), \(A^*\) and \(\{W_i^*\}_{i=1}^m\) satisfy the following equation,

$$\begin{aligned} W_i^*=\left( X_iP_iX_i^T\right) ^{-1}X_iP_i{A^*}^T \qquad i=1,2,\cdots ,m. \end{aligned}$$
(A6)

Further, \(A^*\) is obtained by solving the eigenvalue problem of Eq. (10), so Eqs. (A5a), (A5b) and (A5d) is established. Since the essence of SVT is to solve D through Eq. (A5c) , therefore Eq. (A5c) obviously is satisfied. We know from Lemma 1 that \(\left\| {\mathcal {P}_{\Omega }}\left( {{D}^{k}}-{{D}^{*}} \right) \right\| _{F}^{2}\rightarrow 0\) with \(k\rightarrow \infty \), so Eq. (A5e) is satisfied. In summary, Algorithm 1 will converge to KKT point of problem (7). \(\square \)

1.3 A.3 Proof of Lemma 2

Proof

From the definition of \(p_i\left( x\right) \) and algebraic operation, we get formulation as follows,

$$\begin{aligned} \begin{aligned} f(x)&=\sum \limits _{i=1}^{m}{\left\| (a-W_i^Tx_i)p_i\left( x\right) \right\| _2^2} \\&=\sum \limits _{i=1}^{m}{\left[ p_i\left( x\right) a^Ta-2p_i\left( x\right) a^TW_i^Tx_i+p_i\left( x\right) x_i^TW_iW_i^Tx_i \right] } \\&=\sum \limits _{i=1}^{m}{p_i\left( x\right) \left[ a^Ta-2\left( {{a}^{T}}{{W}_i}^T\right) {{x}_i} +\text {vec}{{\left( {{W}_i}W^{T}_i \right) }^{T}}\text {vec}\left( {{x}_i}{{x}_i}^{T} \right) \right] }. \end{aligned} \end{aligned}$$

Let

$$\begin{aligned} \begin{aligned} w&={{\left[ {{w}_{1}}^{T},{{w}_{2}}^{T},\cdots ,{{w}_{m}}^{T} \right] }^{T}}, \\ {{\Phi }_{p}}\left( {{x}} \right)&={{\left[ p_{1}\left( x\right) \varphi {{\left( x_{1} \right) }^{T}},p_{2}\left( x\right) \varphi {{\left( x_{2} \right) }^{T}},\cdots , p_{m}\left( x\right) \varphi {{\left( x_{m} \right) }^{T}} \right] }^{T}}, \\ \end{aligned} \end{aligned}$$

where \(w_i=\left[ a^T,-2a^TW_i^T,\text {vec}\left( W_iW_i^T\right) ^T \right] ^T\) and \(\varphi \left( x_i\right) =\left[ a^T,x_i^T,\text {vec}\left( x_ix_i^T\right) ^T \right] ^T\) for \(i=1, 2, \dots , m\). Therefore we can rewrite Eq. (18) as follows,

$$\begin{aligned} f(x)=\left\langle w,\Phi _{p} \left( x \right) \right\rangle . \end{aligned}$$
(A7)

So it is easy to see that f(x) is a linear function for \(\Phi _{p} \left( x \right) \). It is shown below that the feature space \(\mathcal {F}\) is derived from \(\hat{k}_p\left( x,y\right) \),

$$\begin{aligned} \begin{aligned} \hat{k}_p\left( x,y\right)&=\sum \limits _{i=1}^{m}{p_i\left( x\right) p_i\left( y\right) \left[ a^Ta+k\left( x_i,y_i\right) +k\left( x_i,y_i\right) ^2\right] } \\&=\sum \limits _{i=1}^{m}{p_i\left( x\right) p_i\left( y\right) \left[ a^Ta+{x_i}^Ty_i+{x_i}^Ty_i{x_i}^Ty_i\right] } \\&=\left\langle \Phi _p \left( x \right) ,\Phi _p \left( y \right) \right\rangle . \end{aligned} \end{aligned}$$

\(\square \)

1.4 A.4 Proof of Theorem 2

Proof

Let’s first derive the upper bound on \(\left\| w\right\| _2^2\) as follows,

$$\begin{aligned} \begin{aligned} \left\| w\right\| _2^2&=\sum \limits _{i=1}^{m}{\left\| w_i\right\| _2^2} \\&=\sum \limits _{i=1}^{m}{\left[ a^Ta+4a^TW_i^TW_ia+\text {vec}\left( W_i^TW_i\right) ^T\text {vec}\left( W_iW_i^T\right) \right] } \\&=\sum \limits _{i=1}^{m}{\left[ a^Ta+4\text {vec}\left( W_i^TW_i\right) ^T\text {vec}\left( aa^T\right) +\text {vec}\left( W_iW_i^T\right) ^T\text {vec}\left( W_iW_i^T\right) \right] } \\&\le m\left( c_1^2+4c_1^2c_2+c_2^2\right) . \end{aligned} \end{aligned}$$

So \(\left\| w\right\| _2<B\). Based on Lemma 2 and assumptions, it is easy to find that \(f\left( x\right) \) belongs to the function class

$$\begin{aligned} \mathcal {F}_B=\left\{ x\rightarrow \left\langle w,\Phi _p\left( x\right) \right\rangle :\left\| w\right\| _2\le B\right\} . \end{aligned}$$

Obviously, \(f\left( x\right) \ge 0\), and we are easy going to show that \(f\left( x\right) \) has an upper bound

$$\begin{aligned} \begin{aligned} f\left( x\right)&=\left\langle w,\Phi _p\left( x\right) \right\rangle \le \left\| w\right\| _2\left\| \Phi _p\left( x\right) \right\| _2\\&=B\sqrt{\left\langle \Phi _p\left( x\right) ,\Phi _p\left( x \right) \right\rangle }=B\sqrt{\hat{k}_p\left( x,x\right) } \\&\le BR. \end{aligned} \end{aligned}$$

As a result, we did this by exploiting the McDiarmid’s concentration inequality [27, 46], the follows inequation holds with probability at lest \(1-\delta \) that

$$\begin{aligned} \mathbb {E}\left[ f\left( x \right) \right] \le \frac{1}{n}\sum \limits _{k=1}^{n}{f\left( {{x}_{k}} \right) }+2\hat{R}_n\left( \mathcal {F}_B \right) +3BR\sqrt{\frac{\ln \left( {2}/{\delta }\; \right) }{2n}}, \end{aligned}$$
(A8)

where \(\hat{R}_n\left( \mathcal {F}_{B}\right) \) is empirical Rademacher’s complexity of \(\mathcal {F}_{B}\). Now let’s estimate the upper bound of \(\hat{R}_n\left( \mathcal {F}_{B}\right) \). According to the definition of \(\hat{R}_n\left( \mathcal {F}_{B}\right) \), we have following inequation holds,

$$\begin{aligned} \begin{aligned} \hat{R}_n\left( \mathcal {F}_{B}\right)&=\frac{1}{n}\mathbb {E}_\sigma \left[ \underset{f\in \mathcal {F}_{B}}{\mathop {\sup }}\sum \limits _{k=1}^{n}{\sigma _kf\left( x^{\left( k\right) }\right) }\right] \\&=\frac{1}{n}{\mathbb {E}_{\sigma }}\left\{ \underset{w,p}{\mathop {\sup }}\,\sum \limits _{k=1}^{n}{\sigma _k\left\langle w,\Phi _p\left( x^{\left( k\right) }\right) \right\rangle } \right\} \\&=\frac{1}{n}\mathbb {E}_\sigma \left\{ \underset{w,p}{\mathop {\sup }}\left\langle w,\sum \limits _{k=1}^{n}{\sigma _k\Phi _p\left( {{x}^{\left( k\right) }}\right) }\right\rangle \right\} \\&\le \frac{B}{n}{{\mathbb {E}}_{\sigma }}\left\{ \underset{p}{\mathop {\sup }}\,\sqrt{\left\langle \sum \limits _{k=1}^{n}{{{\sigma }_{k}}{{\Phi }_{p}}\left( {{x}^{\left( k\right) }} \right) },\sum \limits _{l=1}^{n}{{{\sigma }_{l}}{{\Phi }_{p}}\left( {{x}^{\left( l\right) }} \right) } \right\rangle } \right\} \\&=\frac{B}{n}{{\mathbb {E}}_{\sigma }}\left\{ \underset{p}{\mathop {\sup }}\,\sqrt{\sum \limits _{k,l=1}^{n}{{{\sigma }_{k}}{{\sigma }_{l}}\left\langle {{\Phi }_{p}}\left( {{x}^{\left( k\right) }} \right) ,{{\Phi }_{p}}\left( {{x}^{\left( l\right) }} \right) \right\rangle }} \right\} \\&\le \frac{B}{n}\sqrt{{{\mathbb {E}}_{\sigma }}\left\{ \underset{p}{\mathop {\sup }}\,\left[ \sum \limits _{k,l=1}^{n}{{{\sigma }_{k}}{{\sigma }_{l}}\hat{k}_p\left( x^{\left( k\right) },x^{\left( l\right) }\right) }\right] \right\} }, \\ \end{aligned} \end{aligned}$$
(A9)

where \(\left\{ \sigma _k\right\} _{k=1}^n\) are i.i.d. Rademacher random variable. The first inequality sign makes use of Cauchy–Schwarz inequality. The last inequality holds due to square root function is concave and can be derived from Jensen’s inequality. As a result,

$$\begin{aligned} {\Psi }_{p}={{\mathbb {E}}_{\sigma }}\left\{ \underset{p}{\mathop {\sup }}\,\left[ \sum \limits _{k,l=1}^{n}{{{\sigma }_{k}}{{\sigma }_{l}}\hat{k}_p\left( x^{\left( k\right) },x^{\left( l\right) }\right) }\right] \right\} . \end{aligned}$$
(A10)

Equation (19) can be proved by combining Eqs. (A8), (A9) and (A10). \(\square \)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Chen, X., Zhang, E. et al. Incomplete Multi-view Learning via Consensus Graph Completion. Neural Process Lett 55, 3923–3952 (2023). https://doi.org/10.1007/s11063-022-10973-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10973-9

Keywords

Navigation