Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization

Zhou, Nan; Du, Yuanhua; Liu, Jun; Huang, Xiuyu; Shen, Xiao; Choi, Kup-Sze

doi:10.1007/s10489-022-03884-8

Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization

Published: 08 September 2022

Volume 53, pages 11599–11617, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Nan Zhou ORCID: orcid.org/0000-0002-0434-6231^1,2,3,
Yuanhua Du²,
Jun Liu^2,3,
Xiuyu Huang³,
Xiao Shen⁴ &
…
Kup-Sze Choi³

433 Accesses
1 Altmetric
Explore all metrics

Abstract

Many methods have been proposed recently for high-dimensional data representation to reduce the dimensionality of the data. Matrix Factorization (MF) as an efficient dimension-reduction method is increasingly used in a wide range of applications. However, these methods are often unable to handle data with missing entries. In a Semi-Supervised Learning (SSL) scenario, many commonly used missing value imputation methods, e.g., KNN imputation, cannot utilize the existing information on the labels, which is one of the most discriminative information in the data. Considering the outliers in the observed entries, in this paper, we propose an algorithm called Correntropy based Constraint Nonnegative Matrix Factorization Completion (CCNMF) for simultaneous construction of robust representation and imputation of high-dimensional data in an SSL scenario. Specifically, the Maximum Correntropy Criterion (MCC) is used to construct the model of the CCNMF method to alleviate the negative effects of non-Gaussian noise and outliers in the data. To solve the optimization problem, an iterative algorithm based on a Fenchel Conjugate (FC) and Block Coordinate Update (BCU) framework is proposed. We show that the proposed algorithm can satisfy not only objective sequential convergence but also iterate sequence convergence. The experiments are conducted on the real-world image dataset and community health dataset. In many cases, it is shown that the proposed method outperforms several state-of-the-art methods for both representation and imputation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low Rank Matrix Approximation for Imputing Missing Categorical Data

Completion of multiview missing data based on multi-manifold regularised non-negative matrix factorisation

Article 09 March 2020

Iterative missing value imputation based on feature importance

Article 05 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The Orl and Yale datasets are publicly available. The community health data is available from the corresponding author, Kup-Sze Choi, upon reasonable request.

References

Kriegel H-P, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data (tkdd) 3(1):1–58
Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Google Scholar
Ding C, He X (2004) K-means clustering and principal component analysis. In: International conf. machine learning
Wang C, Zhang J, Wu T, Zhang M, Shi G (2022) Semi-supervised nonnegative matrix factorization with positive and negative label propagations. Appl Intell:1–12
Cai D, He X, Han J, Huang TS (2010) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Google Scholar
Meng Y, Shang R, Jiao L, Zhang W, Yang S (2018) Dual-graph regularized non-negative matrix factorization with sparse and orthogonal constraints. Eng Appl Artif Intell 69:24–35
Google Scholar
Peng S, Ser W, Chen B, Lin Z (2021) Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recognition 111:107683
Google Scholar
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 333–342
Gu Q, Li Z, Han J (2011) Joint feature selection and subspace learning
Wang S, Pedrycz W, Zhu Q, Zhu W (2015) Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recogn 48(1):10–19
MATH Google Scholar
Zhou N, Xu Y, Cheng H, Fang J, Pedrycz W (2016) Global and local structure preserving sparse subspace learning: an iterative approach to unsupervised feature selection. Pattern Recogn 53:87–101
MATH Google Scholar
Zhang Y, Zhang Q, Chen Z, Shang J, Wei H (2019) Feature assessment and ranking for classification with nonlinear sparse representation and approximate dependence analysis. Decis Support Syst 122:113064
Google Scholar
Little RJ, Rubin DB (2019) Statistical analysis with missing data. Wiley, Vol 793
García-Laencina PJ, Sancho-Gómez J-L , Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Applic 19(2):263–282
Google Scholar
Luo X, Zhou M, Li S, Hu L, Shang M (2019) Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE Trans Cybernetics 50 (5):1844–1855
Google Scholar
Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press
Schneider T (2001) Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J Climate 14(5):853–871
Google Scholar
Gold MS, Bentler PM (2000) Treatments of missing data: a Monte Carlo comparison of rbhdi, iterative stochastic regression imputation, and expectation-maximization. Struct Equ Modeling 7(3):319–355
Google Scholar
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525
Google Scholar
Aydilek IB, Arslan A (2012) A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks, International Journal of Innovative Computing. Inf Control 7(8):4705–4717
Google Scholar
Silva-Ramírez E-L, Pino-Mejías R, López-Coello M, Cubiles-de-la Vega M-D (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24 (1):121–129
Google Scholar
Fessant F, Midenet S (2002) Self-organising map for data imputation and correction in surveys. Neural Comput Appl 10(4):300–310
MATH Google Scholar
Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques. Knowl-Based Syst 53:51–65
Google Scholar
Wang G, Deng Z, Choi K-S (2018) Tackling missing data in community health studies using additive ls-svm classifier. IEEE J Biomed Health Inform 22(2):579–587
Google Scholar
Batista GE, Monard MC, et al. (2002) A study of k-nearest neighbour as an imputation method. His 87(48):251– 260
Google Scholar
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717
MathSciNet MATH Google Scholar
Xu Y, Yin W, Wen Z, Zhang Y (2012) An alternating direction algorithm for matrix completion with nonnegative factors. Front Math China 7(2):365–384
MathSciNet MATH Google Scholar
Chen B, Wang J, Zhao H, Zheng N, Príncipe JC (2015) Convergence of a fixed-point algorithm under maximum correntropy criterion. IEEE Signal Process Lett 22(10):1723–1727
Google Scholar
Chen B, Xing L, Zhao H, Zheng N, Príncipe JC (2016) Generalized correntropy for robust adaptive filtering. Trans Signal Process 64(13):3376–3387
MathSciNet MATH Google Scholar
He Y, Wang F, Li Y, Qin J, Chen B (2019) Robust matrix completion via maximum correntropy criterion and half-quadratic optimization. IEEE Trans Signal Process 68:181–195
MathSciNet MATH Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788
MATH Google Scholar
Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502
Google Scholar
Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A (2006) Biclustering of gene expression data by non-smooth non-negative matrix factorization. BMC Bioinformatics 7(1):78
Google Scholar
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 267–273
Liu H, Wu Z, Li X, Cai D, Huang TS (2011) Constrained nonnegative matrix factorization for image representation. IEEE Trans Pattern Anal Mach Intell 34(7):1299–1311
Google Scholar
Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J Imaging Sci 6(3):1758–1789
MathSciNet MATH Google Scholar
Rockafellar RT (2015) Convex analysis Princeton University Press
Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Eighth IEEE international conference on data mining, pp 63–72
Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 33(8):1548–1560
Google Scholar
Liu H, Yang G, Wu Z, Cai D (2014) Constrained concept factorization for image representation. IEEE Trans Cybern 44(7):1214
Google Scholar
Guo Y, Ding G, Zhou J, Liu Q (2015) Robust and discriminative concept factorization for image representation:115–122
Zhang Z, Zhao K (2012) Low-rank matrix approximation with manifold regularization. IEEE Trans Pattern Anal Mach Intell 35(7):1717–1729
MathSciNet Google Scholar
Cai D, He X, Han J (2011) Locally consistent concept factorization for document clustering. IEEE Trans Knowl Data Eng 23(6):902–913
Google Scholar
He R, Hu B-G, Zheng W-S, Kong X-W (2011) Robust principal component analysis based on maximum correntropy criterion. Trans Image Process 20(6):1485–1494
MathSciNet MATH Google Scholar
Folstein MF, Folstein SE, McHugh PR (1975) “mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. J Psychiatry Res 12(3):189–198
Google Scholar
Cleeland C, Ryan K (1994) Pain assessment: global use of the brief pain inventory. Ann Acad Med Singapore
Yesavage JA (1988) Geriatric depression scale. Psychopharmacol Bull 24(4):709–711
Google Scholar
Smith R (1994) Validation and reliability of the elderly mobility scale. Physiotherapy 80 (11):744–747
Google Scholar
Guigoz Y, Vellas B, Garry P (1997) Mini nutritional assessment: a practical assessment tool for grading the nutritional state of elderly patients. Facts Res Intervention Geriatr:15–60
Chan A, Lam K, Hui W, Hu W, Li J, Lai K, Chan C, Yuen M, Lam S, Wong B (2005) Validated questionnaire on diagnosis and symptom severity for functional constipation in the chinese population. Aliment Pharmacol Ther 22(5):483–488
Google Scholar
Roper N, Logan WW, Tierney AJ (2000) The Roper-Logan-Tierney model of nursing: based on activities of living. Elsevier Health Sciences
Shen X, Wang G, Kwan R. Y -C, Choi K. -S. (2020) Using dual neural network architecture to detect the risk of dementia with community health data: algorithm development and validation study. JMIR Medical Informatics 8(8):e19870
Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Google Scholar
Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874
MathSciNet Google Scholar
Jeni LA, Cohn JF, De La Torre F (2013) Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 humaine association conference on affective computing and intelligent interaction. IEEE, pp 245–251

Download references

Acknowledgements

This work was made possible by support from the National Natural Science Foundation of China (No. 11901063), National Key R&D Program of China (No. 2021ZD0112701), the Innovation and Technology Fund of Hong Kong (No. MRP/015/18), the General Research Fund of the Hong Kong Research Grants Council (No. PolyU 152006/19E), and the Scientific Research Fund of the Sichuan Provincial Science and Technology Department (Nos. 2022NSFSC0462, 2021YFG0133, 2021YFG0295, 21ZDYF3598 and 2021YFH0069).

Author information

Authors and Affiliations

Chengdu University, Chengdu, Sichuan, 610106, China
Nan Zhou
Chengdu University of Information Technology, Chengdu, Sichuan, 610225, China
Nan Zhou, Yuanhua Du & Jun Liu
Centre for Smart Health, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Nan Zhou, Jun Liu, Xiuyu Huang & Kup-Sze Choi
School of Computer Science and Cyberspace Security, Hainan University, Danzhou, 571700, China
Xiao Shen

Authors

Nan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhua Du
View author publications
You can also search for this author in PubMed Google Scholar
Jun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiuyu Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Shen
View author publications
You can also search for this author in PubMed Google Scholar
Kup-Sze Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nan Zhou, Yuanhua Du or Kup-Sze Choi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: A. Proof Of Theorem 1

Proof

The problem in (19) can be decomposed into n × d independent problems, each involving one element in B and coming in the form

$$ \min\limits_{{\mathcal{P}}_{\Omega}({\mathbf{B}})={\mathcal{P}}_{\Omega}({\mathbf{A}})} \frac{1}{2}{\mathbf{C}}^{2}_{i,j}({\mathbf{B}}_{i,j}-{\mathbf{D}}_{i,j})^{2},\quad \forall i,j. $$

(A.1)

If i,j ∈Ω, due to the constraint ${\mathcal {P}}_{\Omega }({\mathbf {B}})={\mathcal {P}}_{\Omega }({\mathbf {A}})$, ${\mathbf {B}}^{*}_{i,j}$ needs to equal A_i,j. If i,j∉Ω, the problem (A.1) can be written as

$$ \min\limits_{{\mathbf{B}}_{i,j}} \frac{1}{2}{\mathbf{C}}^{2}_{i,j}({\mathbf{B}}_{i,j}-{\mathbf{D}}_{i,j})^{2},\quad \forall i,j, $$

(A.2)

which is a conventional quadratic function, with optimal solution as ${\mathbf {B}}^{*}_{i,j}=D_{i,j}$. Therefore, (20) is optimal solution of (19). □

Appendix: B. The Equivalence of (12c) and (15)

Let ${\mathbf {C}} = \nabla _{{\mathbf {X}}} f(\hat {{\mathbf {X}}}^{t},{\mathbf {Z}}^{t},{\mathbf {P}}^{t+1},{\mathbf {B}}^{t})$ and $L = L_{{\mathbf {X}}}^{t}$, the objective of (12c) can be written as

$$ \begin{array}{@{}rcl@{}} &&\text{Tr}[({\mathbf{X}}-\hat{\mathbf{X}}^{t})]-{\frac{L}{2}}\text{Tr}[({\mathbf{X}}-\hat{\mathbf{X}}^{t})]\\ &=&\text{Tr}[\mathbf{X}^{\top}\mathbf{C}-(\hat{\mathbf{X}}^{t})^{\top}]\\&&-{\frac{L}{2}}\text{Tr}[\mathbf{X}^{\top}\mathbf{X}-2{\mathbf{X}}^{\top}\hat{\mathbf{X}}^{t}+(\hat{\mathbf{X}}^{t})^{\top}\hat{\mathbf{X}}^{t}] \end{array} $$

(B.1)

Eliminating the terms which are independent of X in (B.1), (12c) can be reformulated as follows:

$$ \begin{array}{l} \quad\underset{{\mathbf{X}}\geq 0}{\arg\max} \text{Tr}\left( {\mathbf{X}}^{\top}{\mathbf{C}}-{\frac{L}{2}}{\mathbf{X}}^{\top}{\mathbf{X}} + L {\mathbf{X}}^{\top}\hat{\mathbf{X}}^{t}\right)\\ \Leftrightarrow\underset{{\mathbf{X}}\geq 0}{\arg}{\max} -\frac{L}{2} \text{Tr}\left( {\mathbf{X}}^{\top} {\mathbf{X}}-\frac{2}{L}{\mathbf{X}}^{\top} {\mathbf{C}}+2{\mathbf{X}}^{\top} \hat{{\mathbf{X}}}^{t}\right)\\ \Leftrightarrow\underset{{\mathbf{X}}\geq 0}{\arg}{\min} \text{Tr}\left( {\mathbf{X}}^{\top} {\mathbf{X}}-\frac{2}{L}{\mathbf{X}}^{\top} {\mathbf{C}}+2{\mathbf{X}}^{\top} \hat{\mathbf{X}}^{t}\right)\\ \Leftrightarrow\underset{{\mathbf{X}}\geq 0}{\arg}{\min} \text{Tr}\left( {\mathbf{X}}^{\top} {\mathbf{X}}-2{\mathbf{X}}^{\top} (\hat{{\mathbf{X}}}^{t}+\frac{1}{L}{\mathbf{C}})\right). \end{array} $$

Finally, (12c) can be reformulated as:

$$ \underset{{\mathbf{X}}\geq 0}{\arg}{\min} \frac{1}{2}\|\mathbf{X}-\left( \hat{\mathbf{X}}^{t}+\frac{1}{L}{\mathbf{C}}\right)\|_{F}^{2}, $$

which is the formulation of (15).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, N., Du, Y., Liu, J. et al. Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization. Appl Intell 53, 11599–11617 (2023). https://doi.org/10.1007/s10489-022-03884-8

Download citation

Accepted: 10 June 2022
Published: 08 September 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10489-022-03884-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Low Rank Matrix Approximation for Imputing Missing Categorical Data

Completion of multiview missing data based on multi-manifold regularised non-negative matrix factorisation

Iterative missing value imputation based on feature importance

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Appendices

Appendix: A. Proof Of Theorem 1

Proof

Appendix: B. The Equivalence of (12c) and (15)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Robust semi-supervised data representation and imputation by correntropy based constraint nonnegative matrix factorization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Low Rank Matrix Approximation for Imputing Missing Categorical Data

Completion of multiview missing data based on multi-manifold regularised non-negative matrix factorisation

Iterative missing value imputation based on feature importance

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Appendices

Appendix: A. Proof Of Theorem 1

Proof

Appendix: B. The Equivalence of (12c) and (15)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation