Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation

https://doi.org/10.1016/j.eswa.2013.08.026Get rights and content

Highlights

  • We propose the Discriminative Orthogonal NMF method for data representation.

  • Our method respects the locally geometrical structure of the data.

  • Our method employs the global discriminant information of the data.

  • We make the method be more adaptive with flexible orthogonality regularization.

  • Extensive experiments suggest the superiority of the proposed method.

Abstract

Learning an informative data representation is of vital importance in multidisciplinary applications, e.g., face analysis, document clustering and collaborative filtering. As a very useful tool, Nonnegative matrix factorization (NMF) is often employed to learn a well-structured data representation. While the geometrical structure of the data has been studied in some previous NMF variants, the existing works typically neglect the discriminant information revealed by the between-class scatter and the total scatter of the data. To address this issue, we present a novel approach named Discriminative Orthogonal Nonnegative matrix factorization (DON), which preserves both the local manifold structure and the global discriminant information simultaneously through manifold discriminant learning. In particular, to learn the discriminant structure for the data representation, we introduce the scaled indicator matrix, which naturally satisfies the orthogonality condition. Thus, we impose the orthogonality constraints on the objective function. However, too heavy constraints will lead to a very sparse data representation that is unexpected in reality. So we further make this orthogonality flexible. In addition, we provide the optimization framework with the convergence proof of the updating rules. Extensive comparisons over several state-of-the-art approaches demonstrate the efficacy of the proposed method.

Introduction

Data representation is a fundamental problem in a broad range of practical applications, such as document clustering, face analysis and collaborative filtering. For example, for document analysis, it is ideal to capture a well-structured data representation for bettering grouping the documents into appropriate clusters (Xu, Liu, & Gong, 2003). For face analysis, the learned data representation that considers the local geometrical structure of the data can reflect the local property of the face images, thus obtaining better recognition accuracy (He, Yan, Hu, Niyogi, & Zhang, 2005). Further, in collaborative filtering, we can learn interpretable low-dimensional representations for users and items to improve the recommendation accuracy (Gu, Zhou, & Ding, 2010).

In order to obtain a desirable data representation, we can make use of dimensionality reduction methods or matrix factorization techniques, such as principle component analysis (PCA) (Abdi & Williams, 2010), singular value composition (SVD) (Wall, Rechtsteiner, & Rocha, 2003), nonnegative matrix factorization (NMF) (Lee & Seung, 2001). They share an important property to embed the data point from a high-dimensional space into a much lower-dimensional space, which greatly reduces the computational overheads for the succeeding tasks. In this work, we focus on studying unsupervised nonnegative matrix factorization since it is more appropriate to deal with nonnegative data, e.g., text documents, handwritten digit images, user ratings in collaborative filtering, etc.

Recently, NMF has established itself as one of the most powerful tools to learn a good data representation in data mining and pattern recognition (Xu et al., 2003, Greene et al., 2008, Zhang et al., 2011). In particular, NMF aims to find a linear approximation to the original matrix using two nonnegative matrices, all entries of which are constrained to be nonnegative. This nonnegative constraint leads to the parts-based representation because they only allow additive but not subtractive combinations. Psychological and physiological evidences have also shown the existence of the parts-based representations in human brains (Lee & Seung, 1999). Thanks to its advantages, NMF has received growing interest in machine learning community and several advanced extensions have been developed in different perspectives. The last decade has also witnessed the widespread application of manifold learning in various domains (Belin et al., 2006, Belkin and Niyogi, 2001, Li et al., 2013a), e.g., face recognition, cancer diagnosis and document categorization. Hence, some researchers have successfully applied manifold learning to nonnegative matrix factorization by encoding the data points via the graph structure, such that it is possible to yield better data representations (Gu and Zhou, 2009, Cai et al., 2011b, Cai et al., 2011a). However, these graph-based approaches only emphasize the local geometrical structure and neglect the discriminant information. In order to capture the discriminant structure, it is an intuitive way to utilize the label information in the form of prior knowledge to guide the learning process. Nevertheless, the data points often have no labels in many applications, which poses a great challenge to extract the discriminant information in this scenario. Recently, the progresses have been achieved in employing the discriminant structural information under the unsupervised learning paradigm (De la Torre and Kanade, 2006, Ye et al., 2007), where the discriminant information is reflected by the between-class scatter and the total scatter of the data. Besides, some applications such as clustering tasks have shown the superiority of the discriminant information (Yang et al., 2010, Yang et al., 2011).

Inspired by this, we present a novel approach called Discriminative Orthogonal Nonnegative matrix factorization (DON) to incorporate the discriminant information. Besides, motivated by the success of manifold learning (Belin et al., 2006, Yuan et al., 2012), we also take into account the intrinsic geometrical structure of the data distribution in this work. Our goal is to learn a well-structured data representation by jointly employing both the local manifold information and the global discriminant information. On the one hand, the new data representation can preserve the intrinsic structure as much as possible through efficiently exploiting the local manifold information. On the other hand, the global discriminant information is utilized to equip the learned data representation with the discriminating power, i.e., differentiate data samples from different groups. Therefore, the proposed approach is expected to learn a good lower-dimensional data representation that not only preserves the local geometrical structure of the data space but also has the global discriminant ability, both of which are beneficial for obtaining promising classification or clustering results. Meanwhile, to learn a discriminant structure for the new data representation, we introduce the scaled indicator matrix (Ye et al., 2007). It can be verified that this scaled indicator matrix satisfies the orthogonality condition. Thus, we impose the orthogonality constraints onto the objective function. However, the orthogonality constraints will lead to a very sparse data representation. In reality, a well-structured data representation should not be too sparse, so we make this orthogonality flexible to guarantee an optimal balance.

It is worthwhile to highlight the main contributions of this work as follows.

  • We propose a novel matrix factorization approach named Discriminative Orthogonal Nonnegative matrix factorization (DON) to derive better data representations in the lower-dimensional data space. This approach employs both the local and global information in the data distribution through manifold discriminant learning.

  • We impose the flexible orthogonality constraints onto the objective function, such that the sparseness of the obtained data representation can be governed to satisfy the requirements in practice.

  • Extensive experiments were conducted to examine the clustering performance of our approach and several state-of-the-art methods. Results on several real-world databases suggest the efficacy of the proposed method.

The remainder of this paper is organized as follows. Section 2 briefly reviews some works related to our approach. In Section 3, we introduce the proposed DON approach followed by the optimization framework as well as some discussions in Section 4. Experimental results are reported in Section 5 with rigorous analysis. Finally, we provide the concluding remarks and future works in Section 6.

Section snippets

Related work

In this section, we briefly review some recent works closely related to our proposed method.

Over the past years, nonnegative matrix factorization has received widespread attention in the field of data mining and pattern recognition. To this end, many of its variants have emerged to address different problems (Xu and Gong, 2004, Hoyer, 2004, Long et al., 2005, Ouhsain and Hamza, 2009, Huang et al., 2011, Kim et al., 2011, Li et al., 2012). For instance, since NMF does not always result in

The proposed DON approach

In this section, we mainly introduce our approach, i.e., Discriminative Orthogonal Nonnegative matrix factorization (DON). First, we deduce the objective function from manifold discriminant learning. Second, we present the optimization framework to solve the objective function and the convergence proof of the multiplicative update algorithm. In the end, we give a brief analysis on the computational complexity of the proposed method. Prior to our approach, we begin with a short review of NMF.

Discussions

In this section, we aim to further explore the performance of the proposed DON approach.

Similar to other matrix factorization techniques (Cai et al., 2011a, Cai et al., 2011b), our approach is able to easily derive a good data representation from updating the two coefficient matrices alternatively. However, the learned new lower-dimensional data representation is different from that derived from other existing methods. In particular, the data representation obtained by using our approach, not

Experiments

In this section, we carry out extensive experiments to examine the clustering performance of our approach in comparison with some state-of-the-art methods.

Conclusion and future works

In this paper, we have proposed a novel approach called Discriminative Orthogonal Nonnegative matrix factorization (DON). It aims to learn a new lower-dimensional data representation that preserves both the local geometrical structure and the global discriminant information by using the manifold discriminant learning.

For the local manifold learning, nearby data points are assumed to be close in the new data space, which is always the case in many real-world applications, thus preserving the

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under Grants 91120302, 61222207, 61173185 and 61173186, National Basic Research Program of China (973 Program) under Grant 2013CB336500, National Key Technology R&D Program under Grant 2012BAI34B01, the Fundamental Research Funds for the Central Universities under Grant 2013FZA5012 and the Zhejiang Province Key S&T Innovation Group Project under Grant 2009R50009.

References (48)

  • M. Belin et al.

    Manifold regularization: a geometric framework for learning from examples

    Journal of Machine Learning Research

    (2006)
  • M. Belkin et al.

    Laplacian eigenmaps and spectral techniques for embedding and clustering

    Advances in Neural Information Processing Systems

    (2001)
  • D. Cai et al.

    Locally consistent concept factorization for document clustering

    IEEE Transactions on Knowledge and Data Engineering

    (2011)
  • D. Cai et al.

    Graph regularized nonnegative matrix factorization for data representation

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • W.-Y. Chen et al.

    Parallel spectral clustering in distributed systems

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2011)
  • F. De la Torre et al.

    Discriminative cluster analysis

  • A. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society. Series B (Methodological)

    (1977)
  • C. Ding et al.

    Orthogonal nonnegative matrix t-factorizations for clustering

  • C. Ding et al.

    Convex and semi-nonnegative matrix factorizations

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2010)
  • D. Greene et al.

    Ensemble non-negative matrix factorization methods for clustering protein–protein interactions

    Bioinformatics

    (2008)
  • Gu, Q., Zhou, J. (2009). Local learning regularized nonnegative matrix factorization. In Proceedings of the 21st...
  • Gu, Q., Zhou, J., Ding, C. (2010). Collaborative filtering: weighted nonnegative matrix factorization incorporating...
  • X. He et al.

    Face recognition using laplacianfaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
  • P. Hoyer

    Non-negative matrix factorization with sparseness constraints

    Journal of Machine Learning Research

    (2004)
  • Cited by (44)

    • Incremental semi-supervised graph learning NMF with block-diagonal

      2024, Engineering Applications of Artificial Intelligence
    • Graph Convolutional Neural Networks with Geometric and Discrimination information

      2021, Engineering Applications of Artificial Intelligence
    • Robust orthogonal nonnegative matrix tri-factorization for data representation

      2020, Knowledge-Based Systems
      Citation Excerpt :

      Another important variant is the graph regularized NMF. It usually uses a nearest neighbor graph to model the geometric structure of data, and incorporates the graph regularization in the optimization problem of NMF for improving the performance [18–22]. In this paper, we propose a novel NMF based method called the correntropy based orthogonal nonnegative matrix tri-factorization (CNMTF), which is insensitive to the data contaminated by non-Gaussian noise and outliers.

    View all citing articles on Scopus
    View full text