Geometrically local isotropic independence and numerical analysis of the Mahalanobis metric in vector space
Introduction
The Mahalanobis distance (Mahalanobis, 1937), which is the distance normalized by the variance–covariance matrix of a distribution, is widely used for pattern recognition and data analysis. It enables improved accuracy of recognition compared to the Euclidean distance.
For example, if two vectors differ in a direction in which the variance is small, the difference is small in terms of the Euclidean distance but may be large from the viewpoint of probability (Fig. 1a).
When the distribution of patterns is normal, normalization by the variance–covariance matrix is reasonable. However, consider a probability distribution function (p.d.f.) that is not normal. The set of samples given as Fig. 1b is an example. It is natural to evaluate the distance not along a straight line but along a curve. In order to realize such a distance, the Mahalanobis metric was proposed.
In the n-dimensional Euclidean space, the normal distribution with an average μ and a variance–covariance matrix Σ is expressed by the p.d.f.where denotes the inner product and Σ is assumed to be regular. Let I be the identity matrix. When x is mapped to y by a linear transformation , p is transformed to the p.d.f. of a normal distribution of which the variance–covariance matrix is I. The inner product that expresses the Mahalanobis distance is given as the pull back of the inner product as (Fig. 2).
The concept of geometrically local isotropic independence (GLII) was proposed to define a normal distribution in a manifold (Yamashita et al., 2006). It provides a normal distribution of which the variance–covariance matrix is aI in the Euclidean space, the von Mises–Fisher distribution on a hyper-spherical surface (Mardia, 1972, Mardia and Jupp, 1999), and distribution proportional to in the Lobachevsky space (Alekseevskij et al., 1993). The Mahalanobis metric equation was defined by extending the linear transformation to a diffeomorphism and using the GLII equation (Yamashita et al., 2006, Son et al., 2008). It is remarkable that its differential equation does not depend on the coordinate system or the metric that is originally defined in the space. Furthermore, the diffeomorphism disappears completely in the equation.
In one-dimensional statistical analysis, approximate transformation to a normal distribution, such as the Box–Cox transformation, is often used. It was also shown that the classification accuracy is improved by transforming a gamma distribution of a component of feature vectors to a normal distribution (Shia et al., 2002). On the other hand, the Mahalanobis distance (whitening variance–covariance of data) is a common technique in pattern recognition. By measuring distance with the Mahalanobis metric, we can do both simultaneously for a multi-dimensional data. Therefore, the Mahalanobis metric will improve the accuracy of statistical analysis and pattern recognition.
We explain the GLII and Mahalanobis metric equations and conduct experiments to solve the Mahalanobis metric equation by the Newton–Raphson method. With the experiment, we show the Mahalanobis metric equation can be solved effectively although the nonlinearity of the Mahalanobis equation is very high. If a p.d.f. is given from sample data in a real case, it must contain errors. To address this, we added an error to an original p.d.f. and solved the Mahalanobis metric equation to investigate the effect of the error on the solution.
In Section 2, the GLII is explained. In Section 3, the Mahalanobis metric equation is shown. We show a solution of the GLII equation on a hyper-spherical surface of which proof is first provided. In Section 4, a numerical analysis method for the Mahalanobis metric with the Newton–Raphson method is proposed. In Section 5, experimental results are shown.
Section snippets
Characterizations of normal distribution in Euclidean space
Normal distributions can be characterized by the equality between the sample mean and the maximum likelihood estimator (C.F. Gauss) by isotropic independence (Maxwell), by the entropy, or by limits (Fry, 1965, Feller, 1968, Maistrov, 1974, Tien and Lienhard, 1985).
In the first characterization, Gauss showed if a p.d.f. is given as , where μ is a parameter, samples are extracted independently, and the maximum likelihood estimator of μ is always given by the sample mean of x, then the
Mahalanobis metric equation
For a p.d.f. p in a manifold M, suppose that a diffeomorphism T from M to a manifold transforms p to a solution of the GLII equation in . Fig. 3 illustrates this transformation, where and are coordinate systems in M and , respectively.
Then, the Mahalanobis metric in M is defined by the pull back of the metric in and is given by the Mahalanobis metric equation:where and is the covariant differential derived from . A scalar f is
Mahalanobis metric by the Newton–Raphson method
We provide an algorithm to solve the Mahalanobis metric equation by the Newton–Raphson method. Here, we discuss the case when the dimension of the space is two. For brevity, , , etc. are denoted without . We expand the Mahalanobis metric Eq. (13) and let its left hand side be . Then, we have:
Experimental results
Since the nonlinearity of the Mahalanobis equation is very high, we have to examine whether or not the equation can be solved from a given p.d.f. In this experiment we show that we can solve it efficiently by the Newton method.
First, we define a coordinate transform aswith constants a and b, where
Function (22) is an infinite-times continuously differentiable function with a compact support. It is sufficient if the second derivatives are
Conclusion
We explained the GLII and the Mahalanobis metric equations and showed experimental results of solving the Mahalanobis equation by the Newton–Raphson method. From the result, we could obtain the Mahalanobis metric even when error is added to a p.d.f.
For future work, we have to develop methods to obtain the Mahalanobis metric from samples and in higher dimensional spaces. Furthermore, extension of the GLII equation to time-series data is necessary.
References (12)
- et al.
Spaces of Constant Curvature
(1993) - (1968)
Probability and its Engineering Uses
(1965)Normalization of statistical variates and the use of rectangular co-ordinates in the theory of sampling distributions
Sankhay
(1937)Probability Theory, a Historical Sketch
(1974)Statistics of Directional Data
(1972)