Elsevier

Pattern Recognition

Volume 90, June 2019, Pages 464-475
Pattern Recognition

Elastic nonnegative matrix factorization

https://doi.org/10.1016/j.patcog.2018.07.007Get rights and content

Highlights

  • ENMF introduces an elastic loss that takes advantage of both Frobenius norm and ℓ2, 1 norm when noise distribution is uncertain, therefore ENMF is far more insensitive to noise and outliers.

  • ENMF takes the geometric information of the projected data points in the low dimensional manifold as feedback to construct the affinity graph, hence ENMF can handle the situation that a few exceptional data pairs are close in the original space but far away from each other in the manifold.

  • ENMF utilizes the exclusive LASSO to enhance the intra-cluster competition and therefore the “winner” is more likely to stand out while the “loser” tends to be out in a sparse manner.

  • ENMF provides consistently better clustering results on several well-known data sets as compared to standard NMF and several other variants of the NMF algorithm.

Abstract

Nonnegative matrix factorization (NMF) plays a vital role in data mining and machine learning fields. Standard NMF utilizes the Frobenius norm while robust NMF uses the robust ℓ2,1-norm to measure the quality of factorization, given the assumption of i.i.d Gaussian noise model and i.i.d Laplacian noise model, respectively. In this paper, we propose a novel elastic loss which is intercalated and adapted between Frobenius norm and ℓ2,1-norm. Inspired by this, we derive an elastic NMF model guided by the elastic loss with incorporating geometry manifold information while enforcing sparsity of coefficients at intra-cluster level via ℓ1,2-norm. The new formulation is more robust to noises while preserving the stronger capability of clustering. We propose an EM-like algorithm (using an auxiliary function) to solve the resultant optimization problem, whose convergence can be rigorously proved. The extensive experiments demonstrate the effectiveness of the novel elastic NMF model on benchmarks.

Introduction

In the fields of data mining and machine learning, the actual input data matrix in many applications is often of very high dimension, hence dimensionality reduction is a crucial process before data mining process. The most widely used dimensionality reduction methods include Principle Component Analysis (PCA), Singular Value Decomposition (SVD), Non-negative Matrix Factorization (NMF), etc. Different from PCA and SVD, NMF tends to discover two nonnegative matrices whose product is approximately close to the original matrix. More specifically, given a nonnegative matrix XRp×n and r ≪ min (p, n), X is approximately factorized into two nonnegative matrices FRp×r and GRr×n. Thus, the original data point xi in the p-dimensional space is projected to gi in a lower k-dimensional subspace defined by columns of F.

Since the initial work by Lee and Seung [1], recent years have seen the expanding research on NMF. NMF has demonstrated its advantages in a variety of areas, such as document clustering [2], [3], multimedia data analysis [4], microarray data analysis [5], social network analysis [6], [7], single channel source separation [8], [9], [10], [11], visual tracking [12], audio source separation [13], [14], detecting topic hierarchies [15], graph matching [16], etc. Algorithmic extensions of NMF also have been extended in order to accommodate a number of data analysis problems, e.g., consensus clustering [17], balanced clustering [18], [19], semi-supervised clustering [20], [21], classification [22], [23], multiple-domain learning [24], multi-kernel learning [25] and collaborative filtering [26].

Standard NMF which utilizes the least square error function to measure the quality of factorization is ideal for zero mean, Gaussian noise. However, in real world cases, many data in various applications may not align with the above situation. It has been proved that the standard NMF is sensitive to outliers [27] which could have significant impact on the objective function with squared residue error. Robust matrix factorization with ℓ1 norm in [28] reduces the negative impact posed by the noise, but it is unable to preserve the feature rotation invariance which is, however, required by many applications. Robust nonnegative matrix factorization using ℓ2,1-norm (ℓ2,1-NMF) in [27] is ideal for the assumption of the i.i.d Laplacian noise model. [29] utilizes a robust capped norm to handle the extreme outliers. Several robust versions of NMF are proposed in [30], including NMF based on the Correntropy Induced Metric (CIM-NMF), row-based CIM-NMF (rCIM-NMF), and NMF based on the Huber function (Huber-NMF).

For many data sets in the area of data mining and machine learning, there is a low dimensional manifold embedded in the high dimensional original space. Manifold learning algorithms including Locally Linear Embedding (LLE) [31], Isometric Mapping (ISOMAP) [32], Laplacian Eigenmap (LE) [33] and Locality Preserving Projections (LPP) [34] aim to detect the underlying manifold structure to improve the learning performance. Several graph based clustering methods [35], [36], [37] have shown the effectiveness by exploiting the locally geometric structure of data. By learning a similarity graph with exactly r connected components (where r equals to the cluster number), the Constrained Laplacian Rank (CLR) [38] and Clustering with Adaptive Neighbors (CAN) [39] methods exhibit excellent performance. Graph Regularized NMF (GNMF) in [40] seeks a matrix factorization which respects the intrinsic geometry of data. Rather than using a fixed graph as GNMF, AdapGrNMF in [41] regularizes NMF with an adaptive graph constructed based on the feature selection results. MultiGrNMF in [42] approximates the intrinsic manifold by a linear combination of several graphs with different models and parameters. The graph dual regularization nonnegative matrix factorization (DNMF) in [43] considers the geometric structures of the data manifold and the feature manifold together. The global discriminative-based nonnegative spectral clustering methods in [44] integrate the geometrical structure and discriminative structure in a joint framework. RMNMF in [45] integrates ℓ2,1-NMF and spectral clustering with an additional orthogonal constraint. In these algorithms, the geometrical information is usually encoded by an affinity graph, whose vertexes represent the data points and edge weights indicate the affinity between data pairs. Meanwhile, locally invariant idea [46] is considered, i.e., the nearby points in the high dimensional original space are likely close to each other in the low dimensional manifold.

Due to nonnegative constraint, NMF results in sparse basis and coefficient matrices which provide parts-based representation. Such parts-based representation is accordant to the psychological and physiological evidence in the human brain [47], [48], [49]. However, in some cases, the sparsity obtained by NMF is not enough. NMF with sparseness constraints (NMFsc) in [50] enhances the sparsity of the basis and coefficient matrices by explicitly setting both ℓ1 and ℓ2 norms. Sparse NMF (SNMF) in [51] imposes the sparsity at intra-data-point level. The global Nonnegative Matrix Underapproximation (G-NMU) and recursive Nonnegative Matrix Underapproximation (R-NMU) methods in [52] utilize an underapproximation technique based on Lagrangian relaxation and provide sparse parts-based representations with low reconstruction error. Logdet divergence based sparse NMF method (LDS-NMF) in [53] deals with the rank-deficiency problem and enhances the sparsity of coefficients using the standard LASSO regularization term.

In this paper, we propose a novel elastic loss which is intercalated and adapted between Frobenius norm and ℓ2,1-norm. Then we build an elastic NMF model (ENMF) guided by the novel elastic loss. To exploit the locally geometric structure of data, ENMF incorporates a manifold regularization term. However, different from other graph-based algorithms, the affinity graph is constructed based on not only the high dimensional original space but also the low dimensional manifold. Noticing that most data points may conform to the locally invariant idea, there are a few exceptions, i.e., two data points are close to each other in the high dimensional original space, but the distance between them in the low dimensional manifold may be very large. From this perspective, we take the geometric information of the projected data points in the manifold as feedback in order to reduce the edge weights between exceptional data pairs in the affinity graph. What’s more, ENMF incorporates an exclusive LASSO regularization term to enhance the sparsity of coefficients. Noticing that coefficients of each single cluster rather than coefficients of different clusters should compete to survive. As a result, the exclusive LASSO term in the formulation is naturally used to encourage the intra-cluster competition but discourage the inter-cluster competition. Given all the above considerations, we derive the corresponding multiplicative updating algorithm (via an auxiliary function approach) and provide rigorous analysis on its convergence and correctness.

The contributions of our ENMF are summarized as follows.

  • ENMF introduces an elastic loss that leverages the advantage of both Frobenius norm and ℓ2,1 norm when noise distribution is uncertain, therefore ENMF is far more insensitive to noise and outliers.

  • ENMF takes the geometric information of the projected data points in the low dimensional manifold as feedback to construct the affinity graph, hence ENMF can handle the situation where a few exceptional data pairs are close in the original space but far away from each other in the manifold.

  • ENMF utilizes the exclusive LASSO to enhance the intra-cluster competition and therefore the “winner” is more likely to stand out while the “loser” tends to recede in a sparse manner.

  • ENMF provides consistently better clustering results on several well-known data sets as compared to standard NMF and several other variants of the NMF algorithm.

The rest of the paper is organized as follows. In Section 2, we present the formulation of our ENMF and its computational algorithm. In Section 3, we provide a rigorous analysis of the convergence of the algorithm. In Section 4, we prove that the converged solution satisfies the Karush-Kuhn-Tucker (KKT) condition, which further validates the correctness of the algorithm. In Section 5, we show experimental results on several well-known data sets by making comparisons with k-means algorithm, PCA k-means algorithm, standard NMF algorithm and several other variants of the NMF algorithm. Finally, we conclude this paper.

Section snippets

Elastic loss

Given the observation xiRp (ith data point), NMF [1] decomposes it into basis FRp×r and the representation giRr given the basis F, i.e.,minF0,g0z(xi,Fgi).

There are several typical losses used for NMF, for example,leastsquareloss:z2(xi,Fgi)=ixiFgi2.2,1loss:z21(xi,Fgi)=ixiFgi.KLdivergence:zkl(xi,Fgi)=ij(XijlogXij(FG)ijXij+(FG)ij).The different loss functions have motivated the different formulations of NMF, such as ℓ2,1-NMF [27], KL-divergence NMF [1], etc.

Inspired by the idea of

Convergence of the algorithm

In this section, we provide the proof of the convergence of the algorithm described in Theorem 3.

Theorem 3

(A) Updating G using the rule of Eq. (17) while fixing F, the objective function of Eq. (11) is nonincreasing. (B) Updating F using the rule of Eq. (16) while fixing G, the objective function of Eq. (11) is nonincreasing.

(A) and (B) of Theorem 3 will be proved respectively in the next two subsections.

Correctness of the algorithm

In this section, we prove that the converged solution satisfies the Karush-Kuhn-Tucker condition of the constrained optimization theory, which has shown the correctness of the algorithm. Since the proof of the correctness of the algorithm w.r.t. F is similar to that of the algorithm w.r.t. G, therefore the former is omitted due to space limitations.

Theorem 9

At convergence, the converged solution G* of the updating rule of Eq. (17) satisfies the KKT condition of the optimization theory.

Proof

The KTT condition

Experiment

In this section, we empirically evaluate the proposed ENMF algorithm by comparing several other clustering algorithms on 9 data sets.

Conclusion and future work

In this paper, we propose an elastic NMF model. Our ENMF utilizes an elastic loss function intercalated between Frobenius norm and ℓ2,1-norm to measure the quality of factorization, therefore it is significantly more insensitive to noise and outliers. Also, our ENMF takes the geometric information of the data points in the manifold as feedback to construct the affinity graph. In addition, our ENMF achieves sparsity at intra-cluster level by the exclusive Lasso regularization term. We also

He Xiong received the BS degree in automation from HeFei University of Technology, China, in 2009 and the master’s degree in automation from the University of Science and Technology of China, in 2013. He is currently a lecturer in the Department of Computer Science at BengBu University. His research interests include machine learning, data mining and computer vision.

References (65)

  • N. Gillis et al.

    Using underapproximations for sparse nonnegative matrix factorization

    Pattern Recognit.

    (2010)
  • D.D. Lee et al.

    Algorithms for non-negative matrix factorization

    Advances in Neural Information Processing Systems

    (2001)
  • W. Xu et al.

    Document clustering based on non-negative matrix factorization

    Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval

    (2003)
  • V.P. Pauca et al.

    Text mining using non-negative matrix factorizations

    Proceedings of the 2004 SIAM International Conference on Data Mining

    (2004)
  • M. Cooper et al.

    Summarizing video using non-negative similarity matrix factorization

    Multimedia Signal Processing, 2002 IEEE Workshop on

    (2002)
  • H. Kim et al.

    Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis

    Bioinformatics

    (2007)
  • S. Zhang et al.

    Learning from incomplete ratings using non-negative matrix factorization

    Proceedings of the 2006 SIAM International Conference on Data Mining

    (2006)
  • Y. Chi et al.

    Probabilistic polyadic factorization and its application to personalized recommendation

    Proceedings of the 17th ACM Conference on Information and Knowledge Management

    (2008)
  • B. Gao et al.

    Machine learning source separation using maximum a posteriori nonnegative matrix factorization

    IEEE Trans. Cybern.

    (2014)
  • B. Gao et al.

    Variational regularized 2-d nonnegative matrix factorization

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • P. Parathai et al.

    Single-channel blind separation using l 1-sparse complex non-negative matrix factorization for acoustic signals

    J. Acoust. Soc. Am.

    (2015)
  • B. Gao et al.

    Adaptive sparsity non-negative matrix factorization for single-channel source separation

    IEEE J. Sel. Top Signal Process.

    (2011)
  • Y. Wu et al.

    Visual tracking via online nonnegative matrix factorization

    IEEE Trans. Circuits Syst. Video Technol.

    (2014)
  • A. Al-Tmeme et al.

    Underdetermined convolutive source separation using GEM-MU with variational approximated optimum model order NMF2d

    IEEE/ACM Trans. Audio Speech Lang. Process.

    (2017)
  • A. Al-Theme et al.

    Underdetermined reverberant acoustic source separation using weighted full-rank nonnegative tensor models

    Acoust. Soc. Am. J.

    (2015)
  • T. Li et al.

    Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization

    Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on

    (2007)
  • H. Liu et al.

    Balanced clustering with least square regression

    AAAI

    (2017)
  • Z. Li et al.

    Balanced clustering via exclusive lasso: a pragmatic approach

    AAAI

    (2018)
  • F. Wang et al.

    Semi-supervised clustering via matrix factorization

    Proceedings of the 2008 SIAM International Conference on Data Mining

    (2008)
  • H. Liu et al.

    Constrained nonnegative matrix factorization for image representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • F. Sha et al.

    Multiplicative updates for nonnegative quadratic programming in support vector machines

    Advances in neural information processing systems

    (2003)
  • N. Srebro et al.

    Maximum-margin matrix factorization

    Advances in neural information processing systems

    (2005)
  • Cited by (15)

    • Image feature selection embedded distribution differences between classes for convolutional neural network

      2022, Applied Soft Computing
      Citation Excerpt :

      Yang et al. proposed a feature selection with local structure learning, in which generating similarity matrix and feature selection are carried out alternately [19]. In addition, feature selection based on matrix decomposition [20,21], such as singular value decomposition, LU decomposition, is also successfully applied in image classification. Start with the perspective of subspace learning, Wang et al. proposed a matrix factorization criterion for unsupervised feature selection and proved its convergence [22].

    View all citing articles on Scopus

    He Xiong received the BS degree in automation from HeFei University of Technology, China, in 2009 and the master’s degree in automation from the University of Science and Technology of China, in 2013. He is currently a lecturer in the Department of Computer Science at BengBu University. His research interests include machine learning, data mining and computer vision.

    Deguang Kong received his Ph.D degree in Computer Science from University of Texas Arlington at 2013. He is currently a senior research scientist (principal data scientist) at Yahoo Research (Sunnyvale), and ever worked in Los Alamos national Lab, NEC research lab, Penn State University and Samsung Research America as a researcher. His research interests focus on feature learning and compressive sensing, user engagement understanding and recommendation, etc. He has published over 30 referred articles in top conferences, including ICML, NIPS, AAAI, CVPR, KDD, ICDM, SDM, WSDM, CIKM, ECML/PKDD, etc. He has served as a program committee member in NIPS, AAAI, IJCAI, KDD, SDM and a reviewer for TPAMI, TKDE, DMKD, TIFS, TNNLS, TDSC, etc.

    View full text