Non-informative hierarchical Bayesian inference for non-negative matrix factorization
Introduction
Non-negative matrix factorization (NMF) has become a popular technique since it was proposed by Lee and Seung [1] in 1999. NMF has demonstrated its power and capabilities in many research fields such as image and video processing [2], [3], [4], audio and acoustic signal processing [5], [6], [7], text and semantic analysis [8], [9], [10]. NMF is widely applied due to its non-negative, interpretable, and part-based representative properties. As we know, there is no negative values in the physical world. Compared with principal components analysis (PCA) [11] and independent components analysis (ICA) [12], NMF adds the non-negative constraint to all the elements. This is the most impressive feature of NMF to fit the physical world. In NMF, given a non-negative dataset X, we intend to find two non-negative factor matrices and , which are named base matrix and feature matrix. In addition, W and H satisfyK is an important parameter here, and its value is the model order. Additionally, K usually satisfies the inequality .
During the past several years, many variants of NMF algorithm have been proposed to improve its performance. Most of the variants can be classified into two categories. One is sparseness-oriented category, the other is manifold-oriented category. The sparseness-oriented algorithms aim to enhance the sparseness of basis by introducing certain constraints. Sparseness is consistent to the nature of NMF algorithm, which is part-based representation. Sparseness in NMF algorithm is different from that in sparse linear regression. In sparse linear regression, the sparseness only acts on H, while the direction W is fixed. Whereas, in NMF algorithms, sparseness refers to the total number of coefficients required to encode the data. The typical algorithms belonging to such category are sparse NMF algorithm proposed in [13], [14], [15] and localized NMF proposed in [16]. In comparison, manifold-oriented variants aim to find the low-dimension manifold of original data set. Such kind of algorithms often apply graph embedding approach to preserve the geometry information of original data into the surrogate low-dimension manifold. One typical algorithm is called non-negative graph embedding [17].
Although sparseness constraint and manifold learning can improve the performance of NMF algorithm, the determination of model order is even more important to improve NMF׳s performance. Unfortunately, this issue has not received sufficient attention and investigation.
From machine learning and data mining perspective, we always attempt to extract the hidden structure of data. More accurate hidden structure extraction can achieve better representation and recognition. On one hand, the hidden structure indicates the real composition of data; on the other hand, it enables the factorization to be interpretable. For instance, suppose a human face can be represented only by four basic components: eyebrows, eyes, nose and mouse, namely, the four basic components are the ground-truth bases to represent a face. So if we can determine that the model order is 4 and can find the true bases, then we can accurately represent the face; On the contrary, if we determine the model order as other numbers rather than 4, then we have to use other parts to represent the face. Obviously, other parts are not the intrinsic features of a face, it is not practical to use them to accurately represent the face. The model order of factorized basis is the most important parameter to evaluate the accuracy of structure extraction. Furthermore, the accurate structure could help us get better understanding and analysis of data, thus improving the performance in applications.
The main challenge of model order determination problem is little prior knowledge available, thus it is hard to approach the real distributions of bases. Consequently, the real model order cannot be discovered. Usually, the model order and cost function need to be pre-defined. There are no more prior knowledge introduced to the algorithm in previous methods. That is why the canonical NMF method and traditional Bayesian method (ML, MAP) cannot handle the model order determination problem. Although fully Bayesian method is a choice to achieve model order determination, its computation cost is too high. Moreover, the accuracy of this approach is also dependent on the hyper-parameter׳s distribution. If the choice of hyper-parameter׳s distribution can not indicates the real condition, we can not obtain the expected results.
In order to overcome the dilemma of discovering model order and high-computation cost, motivated by the model order selection method used in Bayesian PCA [18], we propose a hierarchical Bayesian inference method (in which we introduce two level parameters into the inference model) to seek the correct model order of factorized basis. Furthermore, we utilize non-informative prior as the parameter of the hyper-parameter (second level parameter) to enable our model to approach the real distributions of basis automatically. Then we use L2-norm as the selection function to obtain the value of model order. Experimental results on three datasets demonstrate the efficiency of our algorithm.
The rest of this paper is organized as follows. Section 2 provides a brief review of related works on model order determination in NMF. In Section 3, we describe our non-informative hierarchical Bayesian inference algorithm in details. The analysis and evaluation of experimental results are provided in Section 4. Section 5 concludes the paper.
Section snippets
Related works
Although sparseness optimization and manifold learning are different techniques, they are consistent to the part-based representation principle of NMF. To some extents, sparseness optimization, manifold learning and model order determination are identical, that is, to use a subset of localized features or structures to represent original data. As in localized non-negative matrix factorization (LNMF) [16], [19], [20], [21], some local features are learned to represent data. While in projective
Hierarchical Baysian modelling
We aim to establish a non-informative hierarchical Bayesian model in this work to infer the ground-truth basis and an accurate model order estimation for non-negative matrix factorization. Moreover, such model should be able to achieve the goal automatically. The structure of our non-informative hierarchical Bayesian model is shown in Fig. 1. Compared to basic Bayesian models, we incorporate a hyper-parameter level in our model. Hence, our hierarchical Bayesian model consists of three levels:
Dataset
We investigate three datasets to demonstrate the efficiency of the proposed algorithm.
(1) Fence dataset: it is composed of binary images, the size of each image is 32×32. Each image consists of four row bars (the size is 1×32) and four column bars (the size is 32×1). For every image, the row bars and column bars are valued with 1, while the other pixels are valued with 0. The row bars and column bars randomly appear at the 8th, 15th, 22nd and 29th horizontal direction and vertical direction
Conclusion
In this paper, we have presented a non-informative hierarchical Bayesian inference algorithm for non-negative matrix factorization, which is powerful and efficient to seek the ground-truth basis and correct order of a data model. This is achieved by introducing hierarchical modelling structure and non-informative hyper-parameter. The crucial point is that our algorithm is hyper-prior free, namely, we do not need to find the appropriate hyper-prior for the hyper-parameter layer. The experiment
References (36)
- et al.
Clustering-based initialization for non-negative matrix factorization
Appl. Math. Comput.
(2008) - et al.
Learning the parts of objects by non-negative matrix factorization
Nature
(1999) - et al.
Projective nonnegative matrix factorization for image compression and feature extraction
Image Anal.
(2005) - et al.
A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems
IEEE Trans. Inf. Forensics Secur.
(2007) - et al.
Topology preserving non-negative matrix factorization for face recognition
IEEE Trans. Image Process.
(2008) - et al.
Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis
Neural Comput.
(2009) - et al.
Musical instrument sound multi-excitation model for non-negative spectrogram factorization
IEEE J. Sel. Topics Signal Process.
(2011) - et al.
Robust audio hashing based on discrete-wavelet-transform and non-negative matrix factorisation
IET Commun.
(2010) - et al.
Email surveillance using non-negative matrix factorization
Comput. Math. Organ. Theory
(2005) - Y. Yang, B. Hu, Pairwise constraints-guided non-negative matrix factorization for document clustering, in: IEEE/WIC/ACM...
Principal component analysis
Wiley Interdiscip. Rev.: Comput. Stat.
Independent component analysis: algorithms and applications
Neural Netw.
Non-negative matrix factorization with sparseness constraints
J. Mach. Learn. Res.
Adaptive sparsity non-negative matrix factorization for single channel source separation
IEEE J. Sel. Topics Signal Process.
Cited by (10)
Stochastic behavior of the nonnegative least mean fourth algorithm for stationary Gaussian inputs and slow learning
2016, Signal ProcessingCitation Excerpt :Due to physical characteristics, some problems require the imposition of nonnegativity constraints on the parameters to be estimated in order to avoid uninterpretable results [9]. Over recent decades, nonnegativity as a physical constraint has been studied extensively (see, e.g., nonnegative least-squares [10–13] and nonnegative matrix factorization [14–18]). The nonnegative least-mean-square (NNLMS) algorithm was derived in [19] to address online system identification problems subject to nonnegativity constraints.
Exponential total variation model for noise removal, its numerical algorithms and applications
2015, AEU - International Journal of Electronics and CommunicationsCitation Excerpt :And still, some other types of noise such as speckle noise are caused by some inherent law of physics during the formation of images (e.g. the speckle noise in SAR images). Noise removal has always been an important procedure in image processing, as the results of many advanced techniques, such as data structure detection [1], feature extraction, etc., are heavily determined by image quality. Therefore the noise in images must first be reduced to an acceptable level, before they can serve their purpose in any subsequent required processing.
Bayesian non-negative matrix factorization with adaptive sparsity and smoothness prior
2019, IEEE Signal Processing LettersUncertainty modeling and price-based demand response scheme design in smart grid
2017, IEEE Systems Journal