Multiplicative updates for non-negative projections

doi:10.1016/j.neucom.2006.11.023

Neurocomputing

Volume 71, Issues 1–3, December 2007, Pages 363-373

https://doi.org/10.1016/j.neucom.2006.11.023 Get rights and content

Abstract

We present here how to construct multiplicative update rules for non-negative projections based on Oja's iterative learning rule. Our method integrates the multiplicative normalization factor into the original additive update rule as an additional term which generally has a roughly opposite direction. As a consequence, the modified additive learning rule can easily be converted to its multiplicative version, which maintains the non-negativity after each iteration. The derivation of our approach provides a sound interpretation of learning non-negative projection matrices based on iterative multiplicative updates—a kind of Hebbian learning with normalization. A convergence analysis is scratched by interpretating the multiplicative updates as a special case of natural gradient learning. We also demonstrate two application examples of the proposed technique, a non-negative variant of the linear Hebbian networks and a non-negative Fisher discriminant analysis, including its kernel extension. The resulting example algorithms demonstrate interesting properties for data analysis tasks in experiments performed on facial images.

Introduction

Projecting high-dimensional input data into a lower-dimensional subspace is a fundamental research topic in signal processing and pattern recognition. Non-negative projection is desired in many real-world applications, for example, for images, spectra, etc., where the original data are non-negative. However, most classical subspace approaches such as principal component analysis (PCA) and Fisher discriminant analysis (FDA), which are solved by singular value decomposition (SVD), fail to produce the non-negativity property.

Recently, Lee and Seung [12], [13] introduced iterative multiplicative updates, which are based on the decomposition of the gradient of given objective function, for non-negative optimizations. They applied the technique to the non-negative matrix factorization (NMF) which seems to yield sparse representations. Several variants of NMF such as [5], [22], [24] have later been proposed, where the original NMF objective function is combined with various regularization terms. More recently, Yuan and Oja [23] presented a method called projective non-negative matrix factorization (P-NMF) without any additional terms, but directly derived from the objective function of PCA networks except that the projection was constrained to be non-negative. The simulation results of P-NMF indicate that it can learn highly localized and non-overlapped part-based basis vectors. However, none of the above works provides an explanation why the multiplicative updates can produce sparser and more localized base components.

The multiplicative update rules of the above algorithms are based on decomposition of the gradients of an objective function into positive and negative parts, one as the numerator and the other as the denominator. Nevertheless, such method would fail when the gradient is not naturally expressed in positive and negative parts. Sha et al. [21] proposed an alternative decomposition of the gradient and applied it to the minimization of a quadratic objective. This method albeit cannot handle the situation where the gradient contains only one positive (negative) term. Furthermore, how to combine orthogonality or quadratic unit norm constraints with this method is still unknown.

In this paper we present a more general technique to reformulate a variety of existing additive learning algorithms to their multiplicative versions in order to produce non-negative projections. The derivation is based on Oja's rule [17] which integrates the normalization factor into the additive update rule. Therefore, our method provides a natural way to form the numerator and denominator in the multiplicative update rule even if external knowledge of gradient decomposition is not available. Another major contribution of our approach is that its derivation also provides a sound interpretation of the non-negative learning based on iterative multiplicative updates—a kind of Hebbian learning with normalization.

We demonstrate applicability of the proposed method for two classical learning algorithms, PCA and FDA, as examples. In the unsupervised PCA learning, our multiplicative implementation of linear Hebbian networks outperforms the NMF in localized feature extraction, and its derivation provides an interpretation why P-NMF can learn non-overlapped and localized basis vectors. In the supervised FDA learning, our non-negative variant of the linear discriminant analysis (LDA) can serve as a feature selector and its kernel extension can reveal an underlying factor in the data and be used as a sample selector. The resulting algorithms of the above examples are verified by experiments on facial image analysis with favorable results.

The remaining of the paper is organized as follows. First we introduce the basic idea of multiplicative update rules in Section 2. The non-negative projection problem is then formulated in Section 3. In Section 4 we review Oja's rule and present the technique how to use it in forming the multiplicative update rules. The proposed method is applied in two examples in Section 5: one for unsupervised learning and the other for supervised. The experimental results of the resulting algorithms are presented in Section 6. Finally, conclusions are drawn in Section 7.

Section snippets

Multiplicative updates

Suppose there is an algorithm which seeks an m-dimensional solution $w$ that maximizes an objective function $J (w)$ . The conventional additive update rule for such a problem is $\tilde{w} = w + γ g (w),$ where $\tilde{w}$ is the new value of $w$ , $γ$ a positive learning rate and the function $g (w)$ outputs an m-dimensional vector which represents the learning direction, obtained e.g. from the gradient of the objective function. For notational brevity, we only discuss the learning for vectors in this section, but it is easy to

Non-negative projection

Subspace projection methods are widely used in signal processing and pattern recognition. An r-dimensional subspace out of $R^{m}$ can be represented by an $m \times r$ orthogonal matrix $W$ . In many applications one can write the objective function for selecting the projection matrix in the form $\underset{W}{maximize} J (W) = \frac{1}{2} E {F (∥ W^{T} v ∥^{2})},$ where $v$ is an input vector, F a function from $R$ to $R$ , and $E {\cdot}$ denotes the expectation. For problems where $F (x) = x$ , objective (8) can be simplified to $\underset{W}{maximize} J (W) = \frac{1}{2} Tr (W^{T} E {{vv}^{T}} W) .$ Such form

Oja's rule in learning non-negative projections

The multiplicative update rule described in Section 2 maintains the non-negativity. However, the gradient of projection objective yields a single term and does not provide any natural way to obtain $g^{+}$ and $g^{-}$ (or $G^{+}$ and $G^{-}$ ). In this section we present a very simple approach to include an additional term for constructing the multiplicative update rules if the solution is constrained to be of unit $L_{2}$ -norm or orthogonal.

First let us look at the projection on a one-dimensional subspace. In many

Examples

In this section, we apply the above reforming technique to two known projection methods, PCA and LDA. Before presenting the details, it should be emphasized that we are not aiming at producing new algorithms to replace the existing ones for reconstruction or classification. Instead, the main purpose of these examples is to demonstrate the applicability of the technique described in the previous section and to help readers get more insight in the reforming procedure.

Experiments

We demonstrate here the empirical results of the non-negative algorithms presented in Sections 5.1 and 5.2 when applied for processing of facial images. Before proceeding to details, it should be emphasized that the goal of the non-negative version of a given algorithm usually differs from the original one. The resulting objective value of a non-negative algorithm generally is not as good as that of its unconstrained counterpart. However, readers should be aware of that data analysis is not

Conclusions

We presented a technique how to construct multiplicative updates for learning non-negative projections based on Oja's rule, including two examples of its application to reforming the conventional PCA and FDA to their non-negative variants. In the experiments on facial images, the non-negative projections learned by using the novel iterative update rules have demonstrated interesting properties for data analysis tasks in and beyond reconstruction and classification.

It is still a challenging and

Zhirong Yang received his Bachelor and Master degrees in Computer Science from Sun Yat-Sen University, Guangzhou, China, in 1999 and 2002, respectively. Presently he is a doctoral candidate at the Computer and Information Science Laboratory in Helsinki University of Technology. His research interests include machine learning, pattern recognition, computer vision, and multimedia retrieval.

References (24)

J. Karhunen et al.
Generalizations of principal component analysis, optimization problems, and neural networks
Neural Networks
(1995)
J. Kivinen et al.
Exponentiated gradient versus gradient descent for linear predictors
Inf. Comput.
(1997)
Y. Nishimori et al.
Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold
Neurocomputing
(2005)
E. Oja
Principal components, minor components, and linear neural networks
Neural Networks
(1992)
S. Amari
Natural gradient works efficiently in learning
Neural Comput.
(1998)
G. Baudat et al.
Generalized discriminant analysis using a kernel approach
Neural Comput.
(2000)
N. Cristianini et al.
An Introduction to Support Vector Machines
(2000)
A. Edelman
The geometry of algorithms with orthogonality constraints
SIAM J. Matrix Anal. Appl.
(1998)
T. Feng, S.Z. Li, H.Y. Shum, H.J. Zhang, Local non-negative matrix factorization as a visual representation, in:...
P.J. Flynn, K.W. Bowyer, P.J. Phillips, Assessment of time dependency in face recognition: an initial study, Audio- and...

S. Haykin

Neural Networks—A Comprehensive Foundation

(1998)

A. Hyvärinen et al.

Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces

Neural Comput.

(2000)

Cited by (0)

Jorma Laaksonen received his Dr. of Science in Technology degree in 1997 from Helsinki University of Technology, Finland, where he is presently Academy Research Fellow of Academy of Finland at the Laboratory of Computer and Information Science. He is an author of several journal and conference papers on pattern recognition, statistical classification, and neural networks. His research interests are in content-based information retrieval and recognition of handwriting. Dr. Laaksonen is an IEEE senior member, a founding member of the SOM and LVQ Programming Teams and the PicSOM Development Group, and a member of the International Association of Pattern Recognition (IAPR) Technical Committee 3: Neural Networks and Machine Learning.

View full text

Multiplicative updates for non-negative projections

Abstract

Introduction

Section snippets

Multiplicative updates

Non-negative projection

Oja's rule in learning non-negative projections

Examples

Experiments

Conclusions

Neural Networks

Inf. Comput.

Neurocomputing

Neural Networks

Natural gradient works efficiently in learning

Neural Comput.

Generalized discriminant analysis using a kernel approach

Neural Comput.

An Introduction to Support Vector Machines

The geometry of algorithms with orthogonality constraints

SIAM J. Matrix Anal. Appl.

Neural Networks—A Comprehensive Foundation

Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces

Neural Comput.