The Research of Multi-angle Face Detection Based on Multi-feature Fusion

Hu, Mengnan; Liu, Yongkang; Wang, Rong

doi:10.1007/978-3-319-71607-7_41

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10666))

Included in the following conference series:

International Conference on Image and Graphics

2487 Accesses

Abstract

The method of multi-angle face detection method is proposed based on fusion of Haar-like, HOG and MB-LBP features. Firstly, three Adaboost cascade classifiers for original face region detection are constructed respectively according to Haar-like, MB-LBP and HOG features, using the processed training samples of face and non-face to train the classifiers. Secondly, the preprocessing of testing sample is implemented based on skin color model, which results are the classifiers input, and then the suspected face regions and their weights are obtained. Finally, the refined face regions are selected according to the results of voting and weighted threshold. The method proposed in this paper is implemented on VS2012 platform invoking opencv function library, and the simulation experiment is carried on FDDB. To verify the effect, our method is compared with other methods based on a single feature. Results of experiments show that the proposed method has higher accuracy and better real-time performance.

You have full access to this open access chapter, Download conference paper PDF

Face Detection Technology Research Based on AdaBoost Algorithm and Haar Features

Algorithm Research of Face Recognition System Based on Haar

A Novel Real-Time Face Detection System Using Modified Affine Transformation and Haar Cascades

Keywords

1 Introduction

Video surveillance has been extensively used in the field of public security. As the face information has the advantages of unique identification and easy accessibility, face detection and tracking based on video sequence has been part of the most significant means to locate and track the criminal suspects. Human head is a 3D object, in the video image, 1D information would be dropped during the process of transformation from 3D to 2D. When different angles of 3D face are projected to 2D, different parts of the face will be stretched and compressed, leading to huge difference between different angles of 2D face. Most existing face detection methods require that the detected person must be positive. In some cases, however, due to the movement of the detected person and various imaging angles, it is difficult to obtain a frontal image. Therefore, multi-angle face detection has been one of the research hotspots.

In view of the problem of multi-angle face detection, many methods have been proposed. In some research, Haar-like features were extended differently and classifiers were constructed based on Adaboost algorithm to accomplish multi-angle face detection [1,2,3,4]. The methods based on different skin color models were proposed which were used for coarse detection to improve efficiency, [5, 6]. Guo proposed an Adaboost-SVM algorithm, in which features were fused with Haar-like and edge-orientation field features, and it was combined with the improved decision tree cascade structure to carry out the multi-angle face detection, [7]. But because the number of Haar-like feature is very large and it will be much larger when adapting to different angles, training process based on Haar-like feature is quite slow. So the method based on Multi-block Local Binary Pattern feature (MB-LBP) was proposed to detect human faces, [8]. Reference [9] proposed a method based on MB-LBP feature and controlled cost-sensitive Adaboost (CCS- Adaboost). Reference [10] proposed an algorithm cascading two SVM classifiers trained by HOG and LBP features respectively to implement face detection.

In this paper, the method based on multi-feature fusion to detect multi-angle face is proposed. Firstly, three Adaboost classifiers of a single feature are constructed respectively and trained by processed training samples. Secondly, preprocessed testing samples are sent into three classifiers for the original detection to get suspect face regions and their weights. Finally, the refined face regions are obtained by voting and weighted calculation. The method performance will be evaluated in terms of accuracy and efficiency.

The rest of the paper is organized as follows. The method of multi-angle face detection is discussed in Sect. 2. Simulation experiment and analysis are presented in Sect. 3. Conclusions are given in Sect. 4.

2 Method of Multi-angle Face Detection

The method of multi-angle face detection based on multi-feature fusion is divided into three parts, preprocessing, training and detection, as shown in Fig. 1.

2.1 Preprocessing

Due to the conditions of testing samples are different, in order to reduce their effects, it is necessary for testing samples to conduct illumination compensation and histogram equalization. As testing samples contain not only human face regions but also non-face skin regions which are needed to be screened, so testing samples are normalized to 300 × 300 size.

The skin color feature has good clustering characteristics, and the skin color information can be separated from the background. In the YC _b C _r color space, Y represents the luminance information, C _b and C _r represents chrominance information. The conversion relation from RGB to YC _b C _r is calculated as follows:

$$ \left[ {\begin{array}{*{20}c} Y \\ {C_{b} } \\ {C_{r} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {0.299} & {0.587} & {0.114} \\ { - 0.1687} & { - 0.3313} & {0.5} \\ {0.5} & { - 0.4187} & { - 0.0813} \\ \end{array} } \right]\,\left[ {\begin{array}{*{20}c} R \\ G \\ B \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ {128} \\ {128} \\ \end{array} } \right] $$

(1)

$$ \left[ {\begin{array}{*{20}c} Y \\ {C_{b} } \\ {C_{r} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {0.299} & {0.587} & {0.114} \\ { - 0.1687} & { - 0.3313} & {0.5} \\ {0.5} & { - 0.4187} & { - 0.0813} \\ \end{array} } \right]\,\left[ {\begin{array}{*{20}c} R \\ G \\ B \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ {128} \\ {128} \\ \end{array} } \right] $$

(2)

Skin color in $ {\text{C}}_{\text{b}} - {\text{C}}_{\text{r}} $ color space is aggregated into a elliptic model as in (2) and (3).

$$ \frac{{(x - ec_{x} )^{2} }}{{a^{2} }} + \frac{{(y - ec_{y} )^{2} }}{{b^{2} }} = 1 $$

(3)

$$ \left[ {\begin{array}{*{20}c} x \\ y \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\cos \theta } & {\sin \theta } \\ { - \sin \theta } & {\cos \theta } \\ \end{array} } \right]\,\left[ {\begin{array}{*{20}c} {C_{b}^{'} - C_{x} } \\ {C_{r}^{'} - C_{y} } \\ \end{array} } \right] $$

(4)

where c _x = 109.38, c _y = 152.02, $ \theta $ = 2.53, ec _x = 1.60, ec _y = 2.41, a = 25.39, and b = 14.03. If the pixel satisfies the (2) and (3) which can be considered as skin color, it is recorded as 1, otherwise as 0 to get a binary image. Therefore, after morphology process, skin region can be obtained.

2.2 Feature Extraction

In this paper, Haar-like, MB-LBP and HOG feature is extracted respectively and multi-angle face detection classifier is constructed through training of a large number of face and non-face samples.

Haar-like feature templates consist of a simple combination of rectangles, in which there are the identical white and black rectangles, [11]. The feature value of the template is defined as the difference between the sum of pixels in the white rectangle region and it in the black rectangle region, which reflects the local change of the gray level in the image, [12]. There are three kinds of commonly used Haar-like features, edge feature, linear feature and diagonal feature, mainly representing horizontal, vertical and diagonal direction information. As the Haar-like feature is sensitive to the edge or line segments, so it’s often used to distinguish the face region and non-face region. As shown in Fig. 2(a) and (b) reflects edge feature of horizontal and vertical direction respectively, Fig. 2(c) and (d) reflects linear feature of horizontal and vertical direction respectively, and Fig. 2(e) reflects the diagonal feature, [13].

The traditional method to calculate feature value is to cumulate gray value of each pixel area directly and the computation is enormous, so integral figure is used to simplify the calculation. For original image I(x,y), the integral figure ii(x,y) is the sum of gray values in the black areas in Fig. 2f, and it can be defined as:

$$ ii(x,y) = \sum\nolimits_{{x^{'} \le x,y^{'} \le y,}} {I(x^{\prime } ,y^{\prime } )} $$

(5)

where $ I(x^{\prime } ,y^{\prime } ) $ is the gray value of $ ({\text{x}}^{\prime } ,{\text{y}}^{\prime } ) $, ii(x,y) can be got by (5) and (6),

$$ s(x,y) = s(x,y - 1) + I(x,y) $$

(6)

$$ ii(x,y) = ii(x - 1,y) + s(x,y) $$

(7)

where $ s(x,y) $ is the sum of accumulated pixel values of each row in the original image, as in (7).

$$ s(x,y) = \sum\nolimits_{{y^{\prime } \le y,}} {I(x,y^{\prime } )} $$

(8)

while $ s(x, - 1) = 0,\,ii( - 1,y) = 0 $.

As shown in Fig. 2(g),the sum of gray values of area D can be calculated through $ ii_{1} $, $ ii_{2} $, $ ii_{3} $, and $ ii_{4} $. $ ii_{1} = A $, $ ii_{2} = A + B $, $ ii_{3} = A + C $, $ ii_{4} = A + B + C + D $, so $ D = ii_{4} + ii_{1} - (ii_{2} + ii_{3} ) $.

LBP (Local Binary Pattern) feature is defined in the 3 × 3 neighborhood. As shown in Fig. 3(a), the gray value of center pixel in the neighborhood is the threshold value. Compared with the threshold value, if 8 adjacent gray pixel value is greater, record it as 1, else as 0 to produce an 8-bit binary number (10010110). Then it is converted to a decimal number to obtain LBP encoding (150) of the center pixel in the neighborhood, reflecting the texture information of this region, [14].

MB-LBP (Multi Block-Local Binary Pattern) is used to improve the LBP feature. As shown in Fig. 3(b), the rectangular region is divided into image blocks which are divided into small areas, and the average gray value of the small area is regarded as the gray value of the image block. And then, LBP coding for pixel gray scale is converted into the coding for image blocks. If the image block size is 3 × 3 and the size of each small area is 1, the MB-LBP feature is the LBP feature at this time, as shown in Fig. 4. Therefore, when the image size is small, the gray value of each pixel of the image corresponds to the average gray value of the image block in the larger image, which is equivalent to extracting the MB-LBP feature. For an image size of 24 × 24, with the upper left corner of the image as the origin, with 6 × 6 pixels constituting an image block, 50% overlap between adjacent blocks, a total of 49 blocks, you can get 49 × 256 = 12544 dimensions MB-LBP feature.

HOG (Histogram of Oriented Gradient) feature can be obtained by calculating the histogram of the oriented gradient of the local image, which is used to describe the edge or gradient information of the local region. To calculate this feature, firstly, the image is grayed and gamma normalized. Secondly, the gradient size and gradient direction of each pixel are computed by using the [−1,0,1] and [1,0,−1]^T gradient operators. Thirdly the image is divided into several small cells and 0°–360° is divided into nine bins. The 9D HOG feature vector of each cell unit can be obtained by accumulating weighted votes for gradient orientation over spatial cells. Blocks are composed of several cells and HOG feature of the block can be obtained by normalizing all series HOG feature vector of cells. Finally, the HOG features of all blocks are connected in series to obtain the HOG feature of the whole image. For an image size of 24 × 24, with the upper left corner of the image as the origin, with 3 × 3 pixels constituting a cell, with 2 × 2 cell constituting a block, 50% overlap between adjacent blocks, a total of 49 blocks, horizontal and vertical directions can get 49 × 4 × 9 = 1764 dimensions HOG features respectively.

2.3 Classifier Building

Adaboost is an algorithm to linearly combine many classifiers and form a much better classifier. Firstly, dataset was trained to get weak classifiers through voting. Then, weights were set differently for each weak classifier to achieve the global optimum. Finally, according to the cascade structure, strong classifiers were obtained, [15]. As shown in Fig. 5, Adaboost cascade classifier is a coarse to fine structure, where Y represents the face area, N represents the non-face area. Three features of training samples are extracted to construct Haar-like, MB-LBP and HOG classifiers respectively.

If the classifier is based on a single feature, for the same sample, there will be a situation in which feature A can detect the face correctly while feature B cannot or feature A mistakenly regards the area as face while feature B rules it out. Therefore, in this paper, as shown in Fig. 6, a method combined three features is proposed to integrate these classifiers.

Samples will be sent to three classifiers to get suspected face areas and their weight respectively. And then, vote. When the area is detected by two classifiers at least, do the weighting. When the weight is greater than threshold, the area can be regarded as face. Voting and integrating are calculated as follows.

Firstly, classify the rectangles output by three classifiers. Record the location and weight $ w_{i} (i \ge 0) $ of each rectangle and each rectangle corresponds to an index m _i. If the ratio of overlap areas of two rectangles is more than 0.5 of the smaller area, these two rectangles are regarded as one category. Count the number of categories by traversing all rectangles and each category corresponds to an index $ n_{j} (0 \le j \le i) $.

Secondly, calculate the fusion weight W _j of each category. After traversing all rectangles of each category, set the biggest weight as the original weight W _j of the category and record the index number of the corresponding rectangle. Then fuse the surplus rectangles $ m_{k} (k \le i) $ of each cateegory with the rectangle corresponding to the biggest weight, and calculate the proportion v of fused rectangle accounting for the larger rectangle. The fusion weight can be calculated as in (9).

$$ W_{j} = W_{j} + w_{k} \times v $$

(9)

Finally, voting, weighting, and threshold decision. If the number of rectangles of each category is above one, i.e. the area is detected by at least two classifiers, calculate the integrated weight as in (10),

$$ Weight = W_{j}^{2} + num^{2} $$

(10)

where Weight represents the total weight, W _j represents the fusion weight of the category and num represents the number of rectangles of each category. According to the threshold to determine the final output detection area.

The experiment will be carried out on the test set using four kinds of classifiers and performance will be evaluated in terms of accuracy and real-time.

3 Simulation Experiment and Result Analysis

Simulation experiment is carried out based on VS2012 platform, opencv 2.4.8. Face training samples are from CAS - PEAL database [16] and MIT database, through histogram equalization and size normalization, and 8000 pieces of which constitute the face training set. Non-face training samples are from CMU database and MIT database, through size normalization, and 24000 pieces of which constitute the non-face training set. Testing samples are from FDDB [17] including 135 images and 224 faces.

3.1 Simulation Experiment

The process of classifiers training is divided into five parts, as shown in Fig. 7, initialization, loading samples, logical judgment, computation and save. Firstly, initialize the false rate and the number of strong classifiers to 0.5 and 30. Secondly, load face samples with pos2424.vec file as positive samples and non-face samples with neg.txt as negative samples. Thirdly, judge whether the false rate and the number of strong classifiers reach the initialization. If one of them satisfy the initialization, do the fifth stage. Then, calculate the feature values and return to the second stage. Finally, save the strong classifiers information as a XML file. After training, the number of strong classifiers of three features is 15, 16, 15.

Real-time performance is evaluated by detection time, which means the time required by dealing with a single image and is obtained through processing 135 images and taking the average time. Accuracy performance is evaluated by detection rate and false rate, as shown in Table 1.

Table 1. Algorithm results

Full size table

3.2 Result Analysis

From Table 1, in terms of detection time: Multi-feature fusion < MB-LBP < Haar-like < HOG, the time of HOG required is the longest. Because during the process of classifier training, the average number of weak classifiers included in strong classifier per layer of HOG is about 455 while it of Haar-like is 35 and it of MB-LBP is 15, detection time of HOG is the longest. In terms of detection rate: HOG < Multi-feature fusion < MB-LBP < Haar-like. Although after weighting and integration, compared with Haar-like and MB-LBP feature, the detection rate of Multi-feature fusion is lower and the false rate is 2.11%, filtering out most of the non-face region detected mistakenly. As shown in Figs. 8a, b, c and d is the detection result based on MB-LBP, Haar-like, HOG and Multi-feature fusion respectively.

From Fig. 8a, after weighting and integration, classifier based on Multi-feature fusion can filtrate non-face region detected by single feature, reducing the false rate, and from Fig. 8b the classifier can detect face region which single feature cannot, increasing the detection rate compared with HOG. As shown in Fig. 8c, only one classifier can detect the face region so that the there is no integrated weight calculation, and as shown in Fig. 8d, at least two single feature classifier can detect the face but the integrated weight is less than the threshold, leading to a decrease in detection rate.

It is necessary for face detection to detect the face region correctly and exclude the non-face region, so the method not only needs to ensure a higher detection accuracy but also is required to reduce the false rate as much as possible. In conclusion, the multi-feature fusion method is more suitable for multi-angle face detection.

4 Conclusion

Multi-angle face detection has grown up to a hot topic in the field of face detection. A method of multi-angle face detection based on multi-feature fusion is proposed in this paper, the effect of the method is better than the method based on single feature in the terms of real-time and accuracy, because it combined advantages of three features and has a high confidence of the refined face region through voting and weighting calculation. The results (the detection rate is 88.39% and the false rate is 2.11%) of the simulation experiment shows that the method is suitable for multi-angle face detection. In the future, the research will focus on searching a more appropriate threshold which can improve detection rate and can guarantee the false rate, and studying on applying this method to video sequence to accomplish multi-angle face detection based on video sequences.

References

Vural, S., Mae, Y., Uvet, H., et al.: Multi-view fast object detection by using extended haar filters in uncontrolled environments. Pattern Recogn. Lett. 33(2), 126–133 (2012)
Article Google Scholar
Wang, P., Zou, Y.: Multi-posture face detection real time system based on DM642. Video Eng. 37(5), 179–182 (2013)
Google Scholar
Weijian, J., Gongde, G., Zhiming, L.: An improved adaboost algorithm based on new Haar-like feature for face detection. J. Shandong Univ. 44(2), 43–48 (2014)
Google Scholar
Wang, Q.W., Ying, Z.L.: A face detection algorithm based on Haar-like t features. Pattern Recogn. Artif. Intell. 28(1), 35–41 (2015)
MathSciNet Google Scholar
Quanbin, L.I., Liu, J., Huang, Z.: Multi-view face detection using skin color model and FloatBoost. Comput. Eng. Appl. 49(23), 166–169 (2013)
Google Scholar
Yang, C., Sang, N., Chen, Z., et al.: Multi-view face detection method based on skin-color model and adaboost algorithm. J. Huazhong Univ. of Sci. Technol. 43, 271–275 (2015)
Google Scholar
Guo, S., Gu, G.C., Cai, Z.S., et al.: Multi-pose face detection based on feature fusion and decision tree cascade structure. J. Shenyang Univ. Technol. 34(2), 203–208 (2012)
Google Scholar
Liao, S., Zhu, X., Lei, Z., Zhang, L., Li, S.Z.: Learning multi-scale block local binary patterns for face recognition. In: Lee, S.-W., Li, S.Z. (eds.) ICB 2007. LNCS, vol. 4642, pp. 828–837. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74549-5_87
Chapter Google Scholar
He, Z.X., Ding, X.Q., Fang, C., et al.: Multiview face detection based on LBP and CCS-Adaboost. J. Zhejiang Univ. 47(4), 622–629 (2013)
Google Scholar
Zhang, X.L., Liu, S.X., Liu, M.H.: Face detection based on cascade support vector machine fusing multi-feature. Comput. Appl. Softw. 33(4), 151–154 (2016)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 511–518. IEEE, Hawaii (2001)
Google Scholar
Ling, G., Jiang, Z., Dong, A.M.: Application of the expansion Haar features in eye detection. J. Univ. Electron. Sci. Technol. China 39(2), 247–250 (2010)
Google Scholar
Wang, R., Hou, P.P., Zeng, Z.L.: The application of face detection and tracking method based on OpenCV. Sci. Technol. Eng. 4(24), 15–118 (2014)
Google Scholar
Ying, T.: Local binary pattern based on the directions and its application in facial expression recognition. CAAI Trans. Intell. Syst. 10(3), 422–428 (2015)
Google Scholar
Wenhao, L.I., Chen, Z.: Improved adaboost face detection algorithm. Video Eng. 38(15), 207–212 (2014)
Google Scholar
Gao, W., Cao, B., Shan, S., et al.: The CAS-PEAL large-scale chinese face database and baseline evaluations. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 38(1), 149–161 (2008)
Article Google Scholar
Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, Department of Computer Science, University of Massachusetts, Amherst, December 2010
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology and Network Security, People’s Public Security University of China, Beijing, 10038, China
Mengnan Hu, Yongkang Liu & Rong Wang

Authors

Mengnan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yongkang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Wang .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, M., Liu, Y., Wang, R. (2017). The Research of Multi-angle Face Detection Based on Multi-feature Fusion. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-71607-7_41
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)