Keywords

1 Introduction

Face recognition is one of the most intensively searched topics in biometrics. Its applications are among others behaviour interpretation, human-machine interaction and interfaces. Various data representations and classification methods were proposed [1, 2]. A derivation of the topic is recognition of psychological emotions with the use of face image or video [3].

The problem was addressed by different approaches using various data representations and classifiers [1, 4, 5]. The efforts in face recognition have been largely motivated by industrial interests [6]. Many algorithms were proposed, the majority of the proposed methods, however, fail to achieve satisfactory performance level for face expressions [7], especially for the recognition of face expressions [7, 8].

The authors propose a new feature extraction algorithm to recognize eight facial expressions: anger, disgust, fear, happiness, natural, sadness, surprise and cry, called FE8R algorithm. The data is from MUG Facial Expression Database [9] and color FERET Database [10], with addition of cry expression. The proposed approach relies on essential object detection then geometric feature analysis using directional encoding to feed Artificial Neural Network classifiers.

The paper is organized as follows: Sect. 2 presents state of the art in the area, Sect. 3 describes the proposed approach in detail, Sect. 4 presents some experiments and result discussion, finally Sect. 5 discusses the conclusions and future work.

2 State of the Art

Face expression recognition is a problem closely related to segmentation (finding the face in the image) and facial feature extraction (selection of characteristic points, shapes or areas). Face expression recognition may be defined as the analysis of selected facial features to detect predefined face expression classes, like anger or smile. Therefore it could be seen as a classification problem [11]. Commonly recognized emotions are {anger, disgust, fear, happiness, sadness, surprise} as seen in [12, 13]. In this paper the authors distinguish also two other expressions: cry and natural.

2.1 Similar Approaches

Many approaches depend on feature extraction to produce a feature vector. It may consist of two types of features: geometric features (shape and location of segmented features) [14] or appearance features (based on the texture properties of the skin in certain areas) [2, 15]. Appearance features may be seen as a template matching problem, applied to the emotion recognition area [16].

In [17] authors use feature vectors as an input to the Artificial Neural Network with Radial Basis Function neurons (RBF) in order to recognize facial expressions. The face image is divided into three regions of interest: mouth, nose and eyes with eyebrows. After applying unsharpen filter using certain threshold, the facial characteristic points (FCPs) are detected. FCPs lengths are calculated using Euclidean distance. Two angles in the mouth area are computed to represent geometric features, similarly the two remaining regions, the nose and the eyes. 19 geometric features and 64 appearance features form the input for the RBF neural network.

Authors in [18, 19] propose an approach for coding facial expressions using Gabor wavelets. Gabor filters are applied to each image, featured by 34 fiducial points to produce Gabor filter bank response. This results in a feature vector of length 612 (34 fiducial points with 18 filter responses per point) that represents the facial expressions. PCA (Principal Component Analysis) technique [20] is used to arrange the feature vector in order of descending variance and is truncated at desired length. This is to find out, if the length of feature vector is sufficient for correct recognition of facial expressions. Input vectors are classified by LVQ neural network. In the study, the length of the feature vector was set to 90 and the sub-class size varied from 35 to 77 to classify seven different expressions.

Authors in [21] recognize 7 expressions: angry, disgust, fear, happy, natural, sad, and surprise. They use Gabor filters for facial feature extraction and 6 classification methods. Their major contribution is the analysis of the behaviour of various classifiers (simplified Bayes, AdaBoost, FDLP, SVM, FSLP) in the case of small number of training examples per class. The method proposed by authors (FSLP) reached 92.4 % of correct emotion recognition over 213 images of 10 people in this case.

Authors in [22, 23] use neuro-fuzzy approach to recognize the following expressions: happy, fear, sad, angry, disgust and surprise. The image is segmented into three regions, from which the uniform Local Binary Pattern (LBP) texture features distributions are extracted and represented as a histogram descriptor. LBP is called uniform if the binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa. When it is considered circular, the operator labels of each pixel in a Gray-level image are considered, with the pixel in question at the center and 3\(\,\times \,\)3 neighborhood pixel variations formed to 8-bit binary code. The facial expressions are then recognized using Multiple Adaptive Neuro-Fuzzy Inference System (MANFIS) [24].

This paper algorithm recognizes the largest amount of expressions. It is based on geometric feature extraction and encoding, with ANN used as classifiers. Detailed results are given in Sect. 4.

2.2 Comparison

A short comparison of the authors’ approach in this paper and similar approaches is presented in Table 1, with additional remarks below.

  1. 1.

    The approach described in [17], based on geometric and appearance features, uses as input images scaled to an array of 300 by 300 pixel with 8-bit precision for grayscale values. The recognition rate reported is between \(93.5\,\%\) to \(96\,\%\).

  2. 2.

    In the approach with Gabor wavelets [18], generalization accuracy is reported to be from \(87.51\,\%\) to \(90.22\,\%\). The recognition rate is close to uniform for all expressions including fear. The network does not acquire 100 % correct classifications, even for training database.

  3. 3.

    In the approach [21] the emphasis is put on the small number of cases in the learning database. The reported accuracy for small number of classes (10 people) is 92.4 %.

  4. 4.

    In the approach [22], using neuro-fuzzy and LBP, a classification accuracy rate of \(94.29\,\%\) is reported.

Table 1. Comparison with other approaches

The work presented in this paper does not require input image scaling. The obtained rate of recognition is comparable, despite the fact that number of recognized expressions increases to 8.

3 The Proposed Approach: FE8R Algorithm

The authors propose a complex approach to face expression classification, called FE8R algorithm. The proposed approach relies on the selection of so called essential objects (EO) and analysis of geometric features using either directional encoding or direct feed to Artificial Neural Network classifier. Translation and scaling issues need to be dealt with at (mostly automated) preprocessing stage.

The face image is converted into grayscale. The image is normalized if required, by adjusting the intensity using minimum and maximum of intensity values in the image. The contrast is enhanced by histogram equalization. Filtration using median filtering is used to reduce the noise and preserve the edges. The image is rotated by a suitable angle degree. Also the image edges are detected. The work area and its center are calculated depending on dividing the image into four quarters and the characteristic objects are detected using the coordinates of the minimum and maximum points of each object and the object perimeter is larger than the major-axis of the formed ellipse for the studied object. The characteristic object is reduced to get EO, by applying the gradient value between the x-axis and the major axis ranked in \([-15,+35]\) degrees. The number of EO is reduced by at least 80 %. Two methods are applied, the first method is depending on the slant, we fix the center of the object and study the slant from the starting-point. We consider the slant changing is acting so that the slant changing is examined by testing the alternate of slant sign. By slant progression analysis, one can consider the face expression as happy or sad. The second method uses feature coding, we extract the objects important points. These points are coded by BPCC algorithm [25]. Filtered code is dealt with to form an array of 81 elements. Then the array is scaled to consider as the input of ANN backpropagation network.

Two classification methods are used: the first tracks points on the basis of fixed slant, where the slant is calculated depending on the object center point and the face center, whilst the second encodes the data for ANN.

Two ways are applied the first is by slant variation after fixing the object center based on its starting-point. If the slant variation is clear without alternating slant sign compared with its preceding value, then noticeably, the face is classified as happy. If we have variation with slant sign alternating, the face is sad. The second way considers extracting the characteristic points by defining the end-points, start-points, branch, and turning pixels. The object is coded depending on the detected pixels using eight digit pairs with two passes counterclockwise. Code is filtered according to eight chosen basic pairs. Calculations are made to form an array of 9\(\,\times \,\)9 elements added to the percentage of distributed points in each part of the studied character. Hence the resulting matrix is formed by 85 elements. Finally, the matrix is passed to ANN for classification.

Subsequent phases of the algorithm are detailed in the following sections.

Fig. 1.
figure 1

Face image preprocessing: (a) color image, (b) normalized image, (c) image after histogram equalization, (d) image after median filter, (e) rotated image, (f) edge detection

3.1 Face Image Preprocessing

Face image requires preprocessing to improve the image quality [2629]. The first step is converting the color image into grayscale image, as shown in Fig. 1a. The second step is image normalization with two phases [30]: first by adjusting the intensity regarding minimum and maximum intensity values, second by enhancing the contrast with histogram equalization, as shown in Fig. 1b, c. The third step is filtering using median filter in order to reduce noise, as presented in Fig. 1d. A median filter is considered more effective than convolution, when the goal is to simultaneously preserve the edges [31]. In the fourth step, if required, the image is rotated by a suitable angle around its center (the rotation procedure in [28] is used), as shown in Fig. 1e. The fifth step is detecting the edges by Canny method [32] with the threshold 0.08 and sigma 1 (determined empirically), as depicted in Fig. 1f.

3.2 Extraction of Characteristic Objects

The authors consider that the horizontal lines (or edges) are more influential for face expressions, so the idea is to find this type of edges. After the edges are detected, the work area and its center are determined and represented as \(16 \times 8\) squares. Studied object is enclosed in ellipse and the characteristic objects are extracted using the minimum and maximum coordinates of each object [1, 33] and the gradient between the x-axis and major axis [29], as shown in Fig. 2. The characteristic objects boundaries are drawn in blue. Among all characteristic objects some are labeled as essential objects, depending on the slant. The authors have considered that the difference between the x-axis and major ellipse axis is between \(-15\) and \(+35\) degrees (determined empirically). Two methods are applied; the first is slant analysis of ConvexHullK [34] points to transform them into suitable input of ANN, the second method is encoding the important points of basic objects depending on BPCC algorithm to recognize anger, disgust, fear, happiness, natural, sadness, surprise and cry expressions.

Fig. 2.
figure 2

An image and its boundaries; (a) primary image; (b) work area defined by yellow points in the figure, (c) the center in red and extracted characteristic objects in blue (Color figure online)

3.3 Feature Extraction Algorithm

Feature extraction leads to detection and selection of essential objects. This is given as a flowchart in Fig. 3.

Fig. 3.
figure 3

The flowchart of the essential objects detection and selection

Detailed steps of feature extraction are as follows:

  1. 1.

    The center point of face M(X_k, Y_k) is found in the image.

  2. 2.

    Work area is calculated and its center is detected. The area is defined as the middle of face space containing the eyes, nose and mouth. That space is represented by about three-quarters of the face. This area is divided into \(16 \times 8\) squares. The square length is denoted as Square_line_Y.

  3. 3.

    Exterior boundaries of objects are traced, with any holes inside these objects neglected. These objects are parents while their children boundaries represent continuous regions in the studied image.

  4. 4.

    For each object and its boundaries, the analysis is performed as follows:

    1. (a)

      The center of object is located and denoted by Center_Poi.

    2. (b)

      The minima of coordinates on X-axis and Y-axis are calculated, they are denoted respectively by X-obj_min and Y-obj_min. Similarly, the maxima are denoted by X-obj_max and Y-obj_max.

    3. (c)

      The horizontal line horz_line is represented by the difference of maximum and minimum in Y axis.

    4. (d)

      The vertical line vert_line is represented by the difference of maximum and minimum in X axis.

    5. (e)

      An ellipse containing the object is formed, then major and minor axis ellipse lengths are calculated and denoted by MinorAxisLength and MajorAxisLength.

    6. (f)

      The slant between the x-axis and the major axis of the ellipse is calculated, called obj_slant.

  5. 5.

    If the horz_line or the MajorAxisLength is larger than the vert_line then the object is a characteristic object.

  6. 6.

    Characteristic object is promoted to an essential object if at least one of the following conditions applies:

    1. (a)

      The horz_line or the MajorAxisLength is larger than the vert_line

    2. (b)

      The horz_line is larger than the Square_line_Y

    3. (c)

      The Obj_Slant is larger than \(-15\) and the Obj_Slant is less than 35

  7. 7.

    Two algorithm variations were studied for classification of essential objects:

    1. (a)

      The first one is applied on the slant, which is calculated starting from the fixed point being the object center and the second point is the face center. The images are from Mug Database [9] (plus a set of images prepared by the authors for cry expression) and applied for 8 expressions (anger, cry, disgust, fear, happiness, natural, sadness, surprise). Then the characteristic points of essential objects are computed by two different ways and processed by ANN as 100 features for the first method and 81 for the second. It was applied also on FERET Database [10] for 4 expressions (cry, sad, laugh and smile) and the recognition accuracy was over 90 % in this way.

      1. (i)

        For each detected essential object the following steps are performed:

        • The Convex Hull points [6] are found. They are denoted by ConHull_Poi.

        • The search for start and end point of object starts from the first-quarter and goes counterclockwise, after the face image is divided into four-quarters. The point is considered as start point or end point if the sum of the 8-neighboring pixels of the tested point is equal to 1. They are denoted by Start_Poi and End_Poi.

      2. (ii)

        The slant between ConHull-point and the face center is calculated as in Eq. 1.

        $$\begin{aligned} Slant_1= \frac{\left( ConHull\_Poi\_Y-Center\_i\_Y\right) }{\left( ConHull\_Poi\_X-Center\_i\_X \right) } \end{aligned}$$
        (1)
      3. (iii)

        The slant between ConHull-point and the object center is formed as in Eq. 2.

        $$\begin{aligned} Slant_2= \frac{\left( ConHull\_Poi\_Y-Center\_Poi\_Y\right) }{\left( ConHull\_Poi\_X-Center\_Poi\_X\right) } \end{aligned}$$
        (2)
      4. (iv)

        The slant average Mslant is calculated as in Eq. 3:

        $$\begin{aligned} MSlant= \left( Slant_2+Slant_1 \right) /2 \end{aligned}$$
        (3)
      5. (v)

        The steps from (ii) to (iv) are repeated for all ConHull-points of the object. Figure 4(b, c, d, e, f) present the ConvexHull points, essential objects and face center as well as the way to perform required calculations.

      6. (vi)

        The steps from (i) to (v) are repeated for each essential object. The result is a slant vector that is scaled up to 100, to be the input of ANN.

    2. (b)

      The second way is to use feature encoding by extracting the important points (start-point and end-point of straight piece, start-point and end-point of object) of essential objects. These points are encoded using the BPCC algorithm consideration. Figure 4(h–j) present these points. Essential objects are formed into an array of 81 elements. The array is then considered as the input to Backpropagation ANN. The recognition accuracy in its maximum possibly achieved value was 95 %.

Fig. 4.
figure 4

Characteristic points of essential objects detected in the image (a): (b) Convex Hull points in blue, essential objects center in red, (c) two points of ConHulks points in cyan, (d–f) present the distance between the ConHulks and its object center (face center) shown by yellow arrows, movement goes counterclockwise. (h–j) present essential points: start-point and end-point of straight piece in green, start-point and end-point of object in red and branch point in magenta. (Color figure online)

3.4 Selecting Essential Objects from Characteristic Objects

Less important characteristic objects detected in the image (about 80 %) are eliminated, leaving only essential objects. An example for detected essential objects is depicted in Fig. 4.

Figures 5 and 6 present examples of characteristic and essential objects, characteristic object boundaries are blue and yellow whilst the essential object boundaries are green. Four images from MUG database [9] are shown in Fig. 5, a sample ‘cry’ image produced by the first author is presented in Fig. 6.

Fig. 5.
figure 5

Four images from MUG database [9]. Examples of characteristic and essential objects. Consequent columns are: (a) Original color image, (b) Black and white image after edge-detection, (c) Image with detected Characteristic object boundaries in blue and yellow, face center is marked with the red point (d) essential objects in green with centers in red. (Color figure online)

Fig. 6.
figure 6

A sample ‘cry’ image prepared by the first author. Examples of characteristic and essential objects. Consequent columns are: (a) Original color image, (b) Black and white image after edge-detection, (c) Image with detected Characteristic object boundaries in blue and yellow, face center is marked with the red point (d) essential objects in green with centers in red. Characteristic object boundaries are blue and yellow and the essential are in green. (Color figure online)

3.5 Configuration of Backpropagation Artificial Neural Networks

For the first method, the Convex Hull points are defined, the slant between points and the center of objects, then face center are found. The average of slant is calculated, scaled to be 100 features as a good input to ANN.

For the second case, FBPCC method [25] is applied on the essential objects, the important points (start, end and branch points) are detected. These points are coded during two counterclockwise passes: first diagonal, then perpendicular. Eight sets of binary pairs are used to filter the resulted objects. Filtered data is formed into an array of 81 elements.

The neural network contains 81 neurons in the input layer, and {550; 350; 250} neurons in the 3 hidden layers. The number of neurons in the output layer is 8 neurons to indicate various face expressions anger, cry, disgust, fear, happiness, natural, sadness, surprise. The training parameters of the ANN are set up according to these values: the momentum coefficient is 0.25, the learning rate is 0.05 and the sum square error is 0.00000585. This ANN uses bipolar function as an activation function. The ANN weights and biases are generated randomly. The value range of the input vector is between \(-1\) and +1.

4 Results and Discussion

ANN is trained on 52 faces, 4 images, 8 expressions with 184273 epochs. The faces are taken from MUG Facial Expression Database [9] and color FERET Database [10] with different face expressions. After training, the ANN is tested on a new set of faces (generalization database).

We tested the ANN on 32 new samples that can be split into 8 groups. We noticed some errors of ANN recognizing of facial expressions which is improved using FBPCC method [25].

By comparing the results of the two methods, we noticed there were some error rates in the possibility of distinguishing between the angry, disgusted and fear expressions, also between sad and cry expressions, between happy, natural and disgusted. The main reason for such errors comes from the fact that the work is neglecting the vertical objects and focusing on the horizontal characteristic objects. The range therefore is raised sometimes and decreased other times.

5 Conclusions and Future Work

This paper proposes a new approach for classification of eight face expressions: anger, cry, disgust, fear, happiness, natural, sadness, surprise. Most important paper contribution is the design and implementation of a feature extraction method. It is based on detection and selection of characteristic objects in the image. The preselected characteristic objects are studied in two ways: the first uses slant tracking, the second is based on feature encoding using BPCC algorithm with classification by Backpropagation Artificial Neural Networks. The correct classification rates were close to 50 % and 95 % respectively. The second method was proved to be fast and produced promising results. The method is unsupervised, flexible and can be adopted to recognize even more face expressions. The proposed method provides high efficiency for face expression and could be recommended for further research and studying. Future works in this area will include increasing the number of classes (various face expressions), application of other, possibly more viable, classification methods, and testing proposed approach on other databases, preferably with many subjects. Moreover, the authors are working on the ways of considering vertical oriented objects in order to increase the distinguishing rate of the similar expression features like those of sad and cry.