Elsevier

Signal Processing

Volume 108, March 2015, Pages 309-321
Signal Processing

Non-informative hierarchical Bayesian inference for non-negative matrix factorization

https://doi.org/10.1016/j.sigpro.2014.09.004Get rights and content

Highlights

  • A non-informative hierarchical Bayesian non-negative matrix factorization (NHBNMF) algorithm is proposed.

  • The NHBNMF algorithm can automatically find a set of bases which are close to the set of ground-truth bases.

  • Non-informative parameter is employed to enable automatic bases determination.

  • NHBNMF has satisfied performance on several kinds of data sets.

Abstract

Non-negative matrix factorization (NMF) is an intuitive, non-negative, and interpretable approximation method. Canonical NMF approach could derive some basic components to represent original data, while probabilistic NMF approaches try to introduce some reasonable constraints to optimize the canonical NMF model. However, both of them cannot handle ground-truth bases discovering and model order determination problems. In general, the model order of basis matrix needs to be pre-defined. The model order determines the capability and accuracy of data structure discovering. However, how to accurately infer the model order of basis matrix has not been well investigated. In this paper, we propose a method called non-informative hierarchical Bayesian non-negative matrix factorization (NHBNMF) to automatically determine the model order and discover the data structure. They are achieved through hierarchical Bayesian inference model, maximum a posteriori (MAP) criterion, and non-informative parameters. In NHBNMF method, we first introduce a structure with two-level parameters to enable the entire model to approach the distributions of ground-truth bases. Then we use non-informative parameter scheme to eliminate the hyper-parameter to enable automatic searching. Finally, the model order and ground-truth bases are discovered by using MAP criterion and L2-norm selection. The experiments are conducted based on both synthetic and real-world datasets to show the effectiveness of our algorithm. The results demonstrate that our algorithm can accurately estimate the model order and discover the ground-truth bases. Even for the complicated FERET facial dataset, our algorithm still obtained interpretable bases and achieved satisfactory accuracy of the model order estimation.

Introduction

Non-negative matrix factorization (NMF) has become a popular technique since it was proposed by Lee and Seung [1] in 1999. NMF has demonstrated its power and capabilities in many research fields such as image and video processing [2], [3], [4], audio and acoustic signal processing [5], [6], [7], text and semantic analysis [8], [9], [10]. NMF is widely applied due to its non-negative, interpretable, and part-based representative properties. As we know, there is no negative values in the physical world. Compared with principal components analysis (PCA) [11] and independent components analysis (ICA) [12], NMF adds the non-negative constraint to all the elements. This is the most impressive feature of NMF to fit the physical world. In NMF, given a non-negative dataset X, we intend to find two non-negative factor matrices WRM×K and HRK×N, which are named base matrix and feature matrix. In addition, W and H satisfyXWHs.t.W0,H0K is an important parameter here, and its value is the model order. Additionally, K usually satisfies the inequality KMN/(M+N).

During the past several years, many variants of NMF algorithm have been proposed to improve its performance. Most of the variants can be classified into two categories. One is sparseness-oriented category, the other is manifold-oriented category. The sparseness-oriented algorithms aim to enhance the sparseness of basis by introducing certain constraints. Sparseness is consistent to the nature of NMF algorithm, which is part-based representation. Sparseness in NMF algorithm is different from that in sparse linear regression. In sparse linear regression, the sparseness only acts on H, while the direction W is fixed. Whereas, in NMF algorithms, sparseness refers to the total number of coefficients required to encode the data. The typical algorithms belonging to such category are sparse NMF algorithm proposed in [13], [14], [15] and localized NMF proposed in [16]. In comparison, manifold-oriented variants aim to find the low-dimension manifold of original data set. Such kind of algorithms often apply graph embedding approach to preserve the geometry information of original data into the surrogate low-dimension manifold. One typical algorithm is called non-negative graph embedding [17].

Although sparseness constraint and manifold learning can improve the performance of NMF algorithm, the determination of model order is even more important to improve NMF׳s performance. Unfortunately, this issue has not received sufficient attention and investigation.

From machine learning and data mining perspective, we always attempt to extract the hidden structure of data. More accurate hidden structure extraction can achieve better representation and recognition. On one hand, the hidden structure indicates the real composition of data; on the other hand, it enables the factorization to be interpretable. For instance, suppose a human face can be represented only by four basic components: eyebrows, eyes, nose and mouse, namely, the four basic components are the ground-truth bases to represent a face. So if we can determine that the model order is 4 and can find the true bases, then we can accurately represent the face; On the contrary, if we determine the model order as other numbers rather than 4, then we have to use other parts to represent the face. Obviously, other parts are not the intrinsic features of a face, it is not practical to use them to accurately represent the face. The model order of factorized basis is the most important parameter to evaluate the accuracy of structure extraction. Furthermore, the accurate structure could help us get better understanding and analysis of data, thus improving the performance in applications.

The main challenge of model order determination problem is little prior knowledge available, thus it is hard to approach the real distributions of bases. Consequently, the real model order cannot be discovered. Usually, the model order and cost function need to be pre-defined. There are no more prior knowledge introduced to the algorithm in previous methods. That is why the canonical NMF method and traditional Bayesian method (ML, MAP) cannot handle the model order determination problem. Although fully Bayesian method is a choice to achieve model order determination, its computation cost is too high. Moreover, the accuracy of this approach is also dependent on the hyper-parameter׳s distribution. If the choice of hyper-parameter׳s distribution can not indicates the real condition, we can not obtain the expected results.

In order to overcome the dilemma of discovering model order and high-computation cost, motivated by the model order selection method used in Bayesian PCA [18], we propose a hierarchical Bayesian inference method (in which we introduce two level parameters into the inference model) to seek the correct model order of factorized basis. Furthermore, we utilize non-informative prior as the parameter of the hyper-parameter (second level parameter) to enable our model to approach the real distributions of basis automatically. Then we use L2-norm as the selection function to obtain the value of model order. Experimental results on three datasets demonstrate the efficiency of our algorithm.

The rest of this paper is organized as follows. Section 2 provides a brief review of related works on model order determination in NMF. In Section 3, we describe our non-informative hierarchical Bayesian inference algorithm in details. The analysis and evaluation of experimental results are provided in Section 4. Section 5 concludes the paper.

Section snippets

Related works

Although sparseness optimization and manifold learning are different techniques, they are consistent to the part-based representation principle of NMF. To some extents, sparseness optimization, manifold learning and model order determination are identical, that is, to use a subset of localized features or structures to represent original data. As in localized non-negative matrix factorization (LNMF) [16], [19], [20], [21], some local features are learned to represent data. While in projective

Hierarchical Baysian modelling

We aim to establish a non-informative hierarchical Bayesian model in this work to infer the ground-truth basis and an accurate model order estimation for non-negative matrix factorization. Moreover, such model should be able to achieve the goal automatically. The structure of our non-informative hierarchical Bayesian model is shown in Fig. 1. Compared to basic Bayesian models, we incorporate a hyper-parameter level in our model. Hence, our hierarchical Bayesian model consists of three levels:

Dataset

We investigate three datasets to demonstrate the efficiency of the proposed algorithm.

(1) Fence dataset: it is composed of binary images, the size of each image is 32×32. Each image consists of four row bars (the size is 1×32) and four column bars (the size is 32×1). For every image, the row bars and column bars are valued with 1, while the other pixels are valued with 0. The row bars and column bars randomly appear at the 8th, 15th, 22nd and 29th horizontal direction and vertical direction

Conclusion

In this paper, we have presented a non-informative hierarchical Bayesian inference algorithm for non-negative matrix factorization, which is powerful and efficient to seek the ground-truth basis and correct order of a data model. This is achieved by introducing hierarchical modelling structure and non-informative hyper-parameter. The crucial point is that our algorithm is hyper-prior free, namely, we do not need to find the appropriate hyper-prior for the hyper-parameter layer. The experiment

References (36)

  • Y. Xue et al.

    Clustering-based initialization for non-negative matrix factorization

    Appl. Math. Comput.

    (2008)
  • D. Lee et al.

    Learning the parts of objects by non-negative matrix factorization

    Nature

    (1999)
  • Z. Yuan et al.

    Projective nonnegative matrix factorization for image compression and feature extraction

    Image Anal.

    (2005)
  • I. Kotsia et al.

    A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems

    IEEE Trans. Inf. Forensics Secur.

    (2007)
  • T. Zhang et al.

    Topology preserving non-negative matrix factorization for face recognition

    IEEE Trans. Image Process.

    (2008)
  • C. Févotte et al.

    Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis

    Neural Comput.

    (2009)
  • J. Carabias Orti et al.

    Musical instrument sound multi-excitation model for non-negative spectrogram factorization

    IEEE J. Sel. Topics Signal Process.

    (2011)
  • N. Chen et al.

    Robust audio hashing based on discrete-wavelet-transform and non-negative matrix factorisation

    IET Commun.

    (2010)
  • M. Berry et al.

    Email surveillance using non-negative matrix factorization

    Comput. Math. Organ. Theory

    (2005)
  • Y. Yang, B. Hu, Pairwise constraints-guided non-negative matrix factorization for document clustering, in: IEEE/WIC/ACM...
  • F. Sun, K. Zhang, Nmf-based method of text classification, in: 2010 8th World Congress on Intelligent Control and...
  • H. Abdi et al.

    Principal component analysis

    Wiley Interdiscip. Rev.: Comput. Stat.

    (2010)
  • E. Oja et al.

    Independent component analysis: algorithms and applications

    Neural Netw.

    (2000)
  • P. Hoyer

    Non-negative matrix factorization with sparseness constraints

    J. Mach. Learn. Res.

    (2004)
  • R. De Fréin, S. Rickard, Learning speech features in the presence of noise: sparse convolutive robust non-negative...
  • B. Gao et al.

    Adaptive sparsity non-negative matrix factorization for single channel source separation

    IEEE J. Sel. Topics Signal Process.

    (2011)
  • S. Li, X. Hou, H. Zhang, Q. Cheng, Learning spatially localized, parts-based representation, in: Proceedings of the...
  • J. Yang, S. Yang, Y. Fu, X. Li, T. Huang, Non-negative graph embedding, in: IEEE Conference on Computer Vision and...
  • Cited by (10)

    • Stochastic behavior of the nonnegative least mean fourth algorithm for stationary Gaussian inputs and slow learning

      2016, Signal Processing
      Citation Excerpt :

      Due to physical characteristics, some problems require the imposition of nonnegativity constraints on the parameters to be estimated in order to avoid uninterpretable results [9]. Over recent decades, nonnegativity as a physical constraint has been studied extensively (see, e.g., nonnegative least-squares [10–13] and nonnegative matrix factorization [14–18]). The nonnegative least-mean-square (NNLMS) algorithm was derived in [19] to address online system identification problems subject to nonnegativity constraints.

    • Exponential total variation model for noise removal, its numerical algorithms and applications

      2015, AEU - International Journal of Electronics and Communications
      Citation Excerpt :

      And still, some other types of noise such as speckle noise are caused by some inherent law of physics during the formation of images (e.g. the speckle noise in SAR images). Noise removal has always been an important procedure in image processing, as the results of many advanced techniques, such as data structure detection [1], feature extraction, etc., are heavily determined by image quality. Therefore the noise in images must first be reduced to an acceptable level, before they can serve their purpose in any subsequent required processing.

    View all citing articles on Scopus
    View full text