Keywords

1 Introduction

There are several models and theories for facial expressions presently [2,3,4, 13, 22]. The famous categorical theory by pioneering works of [5] is lucid and powerful, matched well with early engineering applications. One of the reasons is that the classification to basic expressions is easy to understand to human being. On the other hand, recent facial expression analysis has reached a dimension which demands higher sophistication and accuracy, e.g. to capture and understand ambiguous and even obscure expressions. In this sense, the dimensional theory e.g. [9] represented in a spacial form of the basic and other expressions, which paid the way to a quantitative representation of the facial expressions. Unfortunately the psychological spaces used in the dimensional theory were built from psychological evaluations by PCA or MDS which are difficult to find direct correspondence with physical stimuli.

Recently [1] shown construction a psychophysical facial expression space by measurements of the JND thresholds in the facial expression image space or the PCA subspaces. It turned out the facial expression space is not a Euclidean space but a distorted or curved space which can be represented as a Riemann space. It is theoretically interesting to find nontrivial properties on expression perception in terms of geometry of the space. This fact is also particularly meaningful in practice since major facial expression recognition algorithms including various neural networks compute measure of similarity by straight-line distance between input and templates in a feature space such as PCA, AU or certain layers of neural networks, under tacit assumption that the space is Euclidean. In fact, it is known that in a Riemann space the straight lines are curves called geodesics, just like light traces curves around sun, due to the spatial curvature in gravitational field. This means in the facial expression space, the straight-lines are curved and distance should be measured along geodesics, the currently algorithms are incorrect therefore could lead to mis-recognition.

In this paper we show how to transform this Riemann space into a Euclidean space while preserving the distances between any two stimuli, which means to preserve subjective difference in facial expression perception. The approach we used is to construct an isometry or a distance-preserving map from the Riemann space to a Euclidean space. In particular to build this isometry map, we use tools in Riemann geometry called Riemann normal coordinates, which can be regarded as a generalized polar coordinate system.

We applied this approach to the JND data obtained in [1] to build Riemann coordinate system in the facial expression space. The isometry or the distance preserving map is readily to obtain from the system and polar coordinate system in Euclidean space. In order to verify the truth of the isometry we also calculated the JND thresholds mapped into the Euclidean space which are close to unit circles. As an application, we also shown results in facial expression recognition to differentiate subtle differences between facial expressions which are difficult using the traditional methods.

2 Spacial Representations of Facial Expressions

To represent an object in a spacial form such as points or vectors in a space is very useful and powerful as well. e.g. inter-relationships and structures between different objects and their groups can be formalized and analyzed in terms of geometric properties of the space. Novel insights and approaches could also be obtained from mathematical tools for the space with a particular geometry.

The facial expressions have a natural form of representation as vectors in the image space. In fact, in most the facial expression recognition including neural networks, the differences between facial images are evaluated by the Euclidean distance between the two image vectors. One of the problems with such an approach is that the dimension of the space is very high. An approximative solution to it is to use the principal subspaces obtained using e.g. PCA or MDS [12]. Another problem is that facial images are not facial expressions and contains much information irrelevant to expressions. So the image space serves only the space of physical stimuli for the subjective perception.

On the other hand, facial expressions have two major psychological representations: the categorical model and the dimensional model. The former by Ekman et al. [5] is a discrete and linguistic representation, convenient to symbolic processing but without geometric or spacial attributes. The original dimensional theory was intended to find order or geometric structure between these basic categories such as Woodworth and Schlosberg [6,7,8] then generalized to two and three dimension models [9,10,11]. A major problem for the psychological space of facial expressions in the dimensional theory is that the spaces are obtained from psychological evaluations in SD tests or Affect grid in which direct correspondence with physical stimuli were missing. Besides, the analog-looking coordinates obtained from discrete levels of relative evaluations by PCA or MDS do not provide a meaningful and reliable quantitative representation.

Therefore, a continuous and quantitative representation of facial expressions in a space form which has direct correspondence with physical stimuli or facial images is desirable.

3 Facial Expression Space as a Riemannian Space

It is shown in [1] that a psycho-physical space of facial expressions was built from the facial image space through measurement of JND thresholds [14,15,16,17,18] for the expressions.

The JND discrimination thresholds in the facial expression space obtained in [1] are shown here again in Figs. 1, 2 and 3.

In particular, the JND ellipsoids are measured for 7 basic expression and 21 AU images from the Bosphorus database [19,20,21] to produce from 616 Morphing sequences, 57844 images and 2233 threshold points were used to fit 23 ellipsoids from a single observer. The results of 3D, 1st-2nd and 1st-3rd PCA projections are shown in Figs. 1, 2 and 3 respectively.

Fig. 1.
figure 1

23 Ellipsoids in the 3D, PCA space [1]

Fig. 2.
figure 2

23 Ellipses in 1st-2nd PCA space [1]

Fig. 3.
figure 3

23 Ellipses in 1st-3rd PCA space [1]

An obvious observation is that since the JND thresholds from every expressions in the space are supposed to be subjective unit spheres centered at these expressions, these variations of shapes and sizes of the JND ellipsoids at different expressions suggest that the facial expression space is not Euclidean but a distorted or curved space, in fact, a Riemannian space.

Recall that a Riemannian space S is a space in which a metric tensor \(G(\varvec{x})=(g_{ij})\) is smoothly defined at every point \(\varvec{x}\in S\) such that the inner product between two infinitesimal shifts \(d\varvec{x}_1, d\varvec{x}_2\) from \(\varvec{x}\), or two vectors in the tangent space \(T_{\varvec{x}}S\), at \(\varvec{x}\) equals

$$ (d\varvec{x}_1, d\varvec{x}_2)= d\varvec{x}_1^TG(\varvec{x})d\varvec{x}_2= g_{ij}dx_1^i dx_2^j, \quad \forall d\varvec{x}_1,d\varvec{x}_2 \in T_{\varvec{x}}S $$

(Here and after the Einstein summation convention is used.) The unit sphere centered at \(\varvec{x}\) is defined as

$$ ds^2= \Vert d\varvec{x} \Vert ^2=(d\varvec{x}, d\varvec{x})=d\varvec{x}^TG(\varvec{x})d\varvec{x}=1, \quad \forall d\varvec{x} \in T_{\varvec{x}}S $$

which e.g. in 3D space are actually ellipsoids of which the size and shape are determined by the Riemann metric \(G(\varvec{x})\).

Therefore the facial expression space obtained in [1] is a Riemann space. The JND thresholds then define the Riemann metric for the facial expression space.

4 Riemann Spaces and Isometry Between Them

The first implication that the facial expression space is a non-Euclid space is that the inter-relationship between different expressions for us could be much more complicated than expected and against intuitions. The second is that all expression recognition algorithms trying to reduce the relative errors between an input image and a template expression could be misleading. Besides, it is hopeful to discover intrigue properties using Riemannian geometry and novel models to analyze and understand our facial expression perception. Potential new applications to facial expression recognition are also expected.

In order to achieve this understanding of facial expression space, here we discuss the possibility to transform the space to a Euclidean space.

4.1 Distances in Rieman Spaces and Isometries or Distance Preserving Maps

Different from a Euclidean space in which the distance is measured by straight-line distance, the distance between two points in a Riemannian space is defined as the length of the geodesic connecting the two points. Here, as a generalization of straight-line in a Euclidean space, a geodesic u in a Riemann space is obtained by the solution of the differential equations (in the local coordinates \(u^i\))

$$\begin{aligned} \frac{d^{2}u^{i}}{ds}+\varGamma ^{i}_{jk}\frac{du^{j}}{ds}\frac{du^{k}}{ds}=0, \end{aligned}$$
(1)

where \(\varGamma ^{i}_{jk}\) is the Christoffel symbol defined as:

$$\begin{aligned} \varGamma ^{i}_{jk}= \frac{1}{2}g^{i\alpha }\left( \frac{\partial g_{\alpha j}}{\partial u^{k}}+\frac{\partial g_{\alpha k}}{\partial u^{j}}- \frac{g_{jk}}{\partial u^{\alpha }}\right) . \end{aligned}$$
(2)

Here the metric tensor and its inverse are denoted by \(G=(g_{ij}),G^{-1}=(g^{ij})\).

Among two Riemannian spaces \(S_1\) and \(S_2\), an isometry or a distance-preserving map \(f: S_1 \longrightarrow S_2: \varvec{y}=f(\varvec{x})\) between them is defined as such that

$$ \forall \varvec{x}_1, \varvec{x}_2 \in S_1, d(\varvec{x}_1, \varvec{x}_2)=d(\varvec{y}_1, \varvec{y}_2), \qquad \varvec{y}_i=f(\varvec{x}_i)\in S_2 $$

here \(d(\varvec{p}, \varvec{q})\) denotes the distance between \(\varvec{p}\) and \(\varvec{q}\). It is known that such a map will preserve the Riemann metric at every point therefore the geometry of the spaces [27, 28].

4.2 Riemann Normal Coordinates

Here we show a way to build an isometry between the facial expression space to a Euclidean space using Riemann normal coordinates.

A Riemann normal coordinates can be regarded as a generalization of the polar coordinates system in a Euclidean space. In fact, choose a point \(\varvec{x}_0\in S\) as the center or the origin, the Riemann normal coordinates of a point \(\varvec{x}\in S\) is provided by the geodesic connecting \(\varvec{x}\) and \(\varvec{x}_0\). The direction of this geodesic starting from the center \(\varvec{x}_0\) or its tangent vector at \(\varvec{x}_0\) has a spacial angle \(\varvec{\theta }\). The length of this geodesic between two points is the distance \(l=d(\varvec{x}_0, \varvec{x})\). The Riemann normal coordinates of \(\varvec{x}\) is defined as \((l, \varvec{\theta })\), which provide a map from the Riemann space S to a Euclidean space of the same dimension. Indeed, if one chooses the origin of a Euclidean space as the image of the above map, and the image of \(\varvec{x}\) as the point \(\varvec{y}\) in the Euclidean space with the same polar coordinates as \((l, \varvec{\theta })\), then the map is obviously an isometry.

5 Construction of Riemann Normal Coordinates

To calculate a Riemann normal coordinates system could be difficult, especially it is costly to find the geodesics connecting two particular points.

We here use an efficient strategy to construct a Riemann normal coordinates. (For details see [25]).

After appropriate choice of the origin \(\varvec{x}_0\in S\), we calculate all geodesics emanating from the origin \(\varvec{x}_0\), which are uniformly separated from each other with the constant spacial angular increment \(\varvec{\theta }\) (measured with the Riemann metric). In fact, the geodesics from a point with specified direction is easy to find using an ODE solver. Then we find the concentric circles, which are not geodesics, but can be easily obtained by connecting all points along the adjacent geodesics which have the same distance r from the center point \(\varvec{x}_0\). Then we have a pre-calculated coordinates grids consisting of geodesics of angle \(\varvec{\theta }\) and concentric circles of radius r.

For a point \(\varvec{x}\in S\), the two parameters \((r, \varvec{\theta })\), the angle and the distance, will provide \(\varvec{x}\) a unique coordinates in the space which corresponding also to the angle and distance in a polar coordinates system in a Euclidean space. If \(\varvec{x}\) is not lie exactly on a geodesic but in between two adjacent geodesics with angles as \(\varvec{\theta }\) and \(\varvec{\theta }+\varvec{\delta }\), one can choose the angle closest to \(\varvec{x}\), otherwise one can calculate a new geodesics with angle \(\varvec{\theta }+\varvec{\delta }/2\) starting from the origin \(\varvec{x}_0\). In fact, the spacial resolution \(\varvec{\delta }\) can be predetermined according to required accuracy and computational cost in trade-off.

Therefore we obtain an isometry between the neighborhood of \(\varvec{x}_0\) in the Riemann space and the Euclidean space. In particular, the point \(\varvec{x}\) in the Riemann space \(\varvec{x}\in S\) has an coordinates consists of an angle \(\varvec{\theta }\) and the distance r from \(\varvec{x}_0\), the values of coordinates can read out readily from the corresponding grid of the polar coordinates in the Euclidean space as shown in Fig. 4. As a result the ellipsoids in Riemann space are transformed to unit circles in Euclidean space as shown in Fig. 5.

Fig. 4.
figure 4

Transform Riemann to Euclidean space and Riemann normal coordinates

Fig. 5.
figure 5

Locally the elliptic “unit circles” become real unit circles

6 Results and Applications

The above algorithms are applied to the JND threshold data in [1] to transform the facial expression space to Euclidean space with Riemann normal coordinates.

The results, in particular a projection to the 1st-3rd PCA subspace is shown in Fig. 6.

Fig. 6.
figure 6

Facial expression space transformed to Euclidean space

The effect and accuracy of this transformation from the Riemann space to Euclidean space can be evaluated by the closeness of the images of the local JND ellipses to unit circles in Euclidean space. It is shown in Fig. 7 that the images of JND ellipses are very closed to unit circles in Euclidean space, which shows that the transformation is successful since that a global isometry is also a local isometry and vice versa [27, 28].

Fig. 7.
figure 7

JND ellipses mapped to unit circles in Euclidean space

We then applied the above method to facial expression recognition. Instead of measuring differences between input images and template images along straight-lines or with Euclidean distances as in current recognition methods, we measure the subjective differences in the facial expression space by geodesic distances.

The two inputs A and B in Fig. 8 are quite similar as images but very different as expressions. As shown in Fig. 9, they are close to each other in the image space by the straight-line distance showing on the left hand side, but actually separated by the geodesic distance in the Riemann space, which can be read out from the polar coordinate system in the Euclidean space on the right hand side.

The two inputs C and D in Fig. 10 are quite different as images but similar as expressions. As shown in Fig. 11, they are separated in the image space by the straight-line distance showing on the left hand side, but actually close to each other by the geodesic distance in Riemann space, which can be read out from the polar coordinate system in the Euclidean space on the right hand side.

Fig. 8.
figure 8

Two inputs separate in image space but close by geodesic distance

Fig. 9.
figure 9

Discrepancy between image A and B in image distance and geodesic distance

Fig. 10.
figure 10

Two inputs close in image space but separate by geodesic distance

Fig. 11.
figure 11

Discrepancy between image C and D in image distance and geodesic distance

7 Discussion, Conclusions and Future Work

We showed how to transform the facial expression space to a Euclidean space by Riemann normal coordinates which is expected to bring forth various applications in both theoretical modeling and practical recognition of facial expressions.

Another application is to compare the facial expression perceptions between different observers and furthermore to exchange their impressions. For a recent different approach of this problem see [23].

To estimate the Riemannian metric in a d-dimensional space is to measure the JND thresholds or the \(d\times d\) symmetric matrices \(G(\varvec{x})=(g_{ij})\) which contains \(d(d+1)/2\) entries. When the dimension d increased, the number of the variables therefore the number of measurement data in order to determine these variables increases rapidly. Therefore it is important to determine the effective dimension of the facial expression space. The same time, one needs a fast and accurate procedure to measure JND threshold hyper-ellipsoids in a high dimensional space. As shown above, however, applications of the above algorithms to subspaces which are meaningful in certain applications are also useful.

Another implement issue is that theoretically it may not be able to transform the facial expression space into a single Euclidean space but a multi-patch approach is needed. It is related with the topology and curvature of the Riemann space and the so-called injectivity radius, which should be investigated in the future. For further information see [25, 26].