Real-time head pose estimation using multi-task deep neural network
Introduction
Driver inattention is a major cause of traffic accidents. According to the National Highway Traffic Safety Administration (NHTSA), many of the traffic accident fatalities and casualties in the United States during the past two years have been caused by driver inattention. In addition, about 3400 of the 35,092 US traffic deaths in 2015 were caused by driver distraction, which is 8.8% more than the 3197 deaths from the same cause in 2014. This is because the probability of a traffic accident is high due to a mistake of a driver rather than a defect in a car or a road. Therefore, if the driver inattention is automatically detected, a traffic accident can be avoided by giving a warning to the driver in advance. Driver inattention occurs mainly when the driver’s distracted or tired. If this happens, the driver adopts a different head pose than usual, so driver inattention can be detected before an accident. Therefore, head pose estimation plays an important role in active safety and advanced driver assistance systems (ADAS) in intelligent vehicles.
From the viewpoint of computer vision, head pose estimation is a process of inferring the position and direction (, , and ) from input face images. The existing approaches can be roughly classified into two types: generative methods and discriminative methods. Generative methods use geometric clues or a variable face model. These methods output continuous head pose values rather than individual categories, and they have the advantage of obtaining facial landmarks for various applications. However, since these methods heavily rely on the detection of facial feature points, estimated head pose gets less reliable in environments where facial feature points are difficult to detect, such as large variations in head pose or facial expression, occlusion, noise, blur, and low-resolution images. Discriminative methods use machine learning methods along with visual features of the entire face. These methods are robust to challenging head poses and low-resolution images. However, most methods use facial images divided into specific head pose intervals, and classify input images into corresponding categories. Therefore, the estimates are categorized at large intervals (usually over 10°) rather than continuous values.
Head pose estimation is challenging problem in practice. Lighting changes, severe vibrations, and large pose changes frequently occur, and these affect appearance of a driver’s face. In addition, it is necessary to calculate the head pose in real time and give a warning to the driver. This paper addresses these problems using a multi-task deep learning method. Our approach uses low-resolution grayscale images for real-time calculation. The proposed method was found to be superior to existing methods through qualitative and quantitative evaluation.
Section snippets
Related work
There have been several approaches to head pose estimation using an image. This section presents the related studies according to approaches, and discusses the advantages and disadvantages of representative algorithms in each category.
Proposed method and datasets
In this section, we first give an overview of the proposed multi-view face detection and head pose estimation algorithm. The next sections discuss the details of the multi-task learning network and datasets.
Experimental results
In this section, we demonstrate our real-time head pose estimation algorithm. We report experiments that we conducted on various aspects to quantitatively and qualitatively verify the performance of the proposed algorithm. We verified the validity of the proposed multi-task learning methodology, tested the performance of the proposed algorithm in face detection and head pose estimation, and compared it with those of state-of-the-art algorithms. Finally, we present the results of our algorithm
Conclusion
In this paper, we proposed a multi-task learning-based real-time deep learning framework that can robustly estimate a driver’s head pose using images obtained under poor conditions in various vehicle environments. We also introduced a method that trains multi-task learning DNN with individual datasets, even if there are no jointly annotated datasets. Compared with the single DNN-based learning method, the proposed multi-task learning-based system showed better accuracy without overfitting to
Acknowledgments
This research was supported by the Ministry of Trade, Industry and Energy and the Korea Evaluation Institute of Industrial Technology (KEIT) with the program number of “10060110”.
Byungtae Ahn received a B.S degree in Electronic Engineering from Kumoh national Institute of Technology, Korea, in 2007, and a M.S. degree in Bio-mechatronics from Sungkyunkwan University, Korea, in 2011. He is currently working toward the Ph.D. degree in Robotics Program at KAIST. He received a Qualcomm Innovation Award in 2013, and has been listed in Marquis Who’s Who in the World, 2016. His research interests include deep learning, Human–Robot Interaction (HRI), and Advanced Driver
References (52)
- et al.
Active shape models-their training and application
Comput. Vis. Image Underst.
(1995) - et al.
3d active appearance model alignment using intensity and range data
Robot. Auton. Syst.
(2014) - et al.
RSMAT: Robust simultaneous modeling and tracking
Pattern Recognit. Lett.
(2010) - et al.
Estimating face pose by facial asymmetry and geometry
- et al.
Active appearance models
IEEE Trans. Pattern Anal. Mach. Intell.
(2001) - et al.
Accurate single view model-based head pose estimation
- et al.
Supervised descent method and its applications to face alignment
- F.D. la Torre, W.S. Chu, X. Xiong, F. Vicente, X. Ding, J. Cohn, Intraface, in: IEEE International Conference on...
- et al.
Continuous head movement estimator for driver assistance: issues, algorithms, and on-road evaluations
IEEE Trans. Intell. Transp. Syst.
(2014) - et al.
Yaw estimation using cylindrical and ellipsoidal face models
IEEE Trans. Intell. Transp. Syst.
(2014)
Estimation of driver head yaw angle using a generic geometric model
IEEE Trans. Intell. Transp. Syst.
Face detection, pose estimation and landmark localization in the wild
Driver gaze tracking and eyes off the road detection system
IEEE Trans. Intell. Transp. Syst.
Biased manifold embedding: a framework for person-independent head pose estimation
A two-layer framework for piecewise linear manifold-based head pose estimation
Int. J. Comput. Vis.
3D facial pose estimation by image retrieval
Estimating face orientation from robust detection of salient facial features
Head pose estimation based on random forests for multiclass classification
Robust head pose estimation using supervised manifold learning
Robust head pose estimation via convex regularized sparse regression
Real-time face pose estimation from single range images
Real time head pose estimation from consumer depth cameras
Random forests for real time 3D face analysis
Int. J. Comput. Vis.
Backpropagation applied to handwritten zip code recognition
Neural Comput.
Imagenet classification with deep convolutional neural networks
Adv. Neural Inf. Process. Syst.
Cited by (53)
Fusion-competition framework of local topology and global texture for head pose estimation
2024, Pattern RecognitionAn efficient multitask neural network for face alignment, head pose estimation and face tracking
2022, Expert Systems with ApplicationsCitation Excerpt :However, compared to regressing landmarks from the input image directly, these methods are much less efficient because of a large number of bottom-up and top-down convolution layers. Existing head pose estimation methods only utilize the geometric information of facial landmarks (model-based approach (Baltrusaitis et al., 2018; Martins & Batista, 2008)) or appearance information of the input image (appearance-based approach Ahn et al., 2018; Drouard et al., 2017; Ranjan et al., 2019; Ruiz et al., 2018) to estimate the Euler angle of faces. Model-based approaches fit a predefined 3D face model to the face in image according to facial landmarks.
Robotic manipulation based on 3-D visual servoing and deep neural networks
2022, Robotics and Autonomous SystemsCitation Excerpt :Meanwhile, vision-based robot motion control methods are also reviewed. Traditional approaches utilized RGB images with local interest-points and feature matching algorithms in order to achieve object recognition and pose estimation task [9]. The methods required certain local descriptor for different scale, rotation, and viewpoints.
Head pose estimation using deep neural networks and 3D point clouds
2022, Pattern RecognitionCitation Excerpt :However, it is difficult to accurately estimate the 3D head pose in a complex environment. In recent years, researchers [1,7–12] have used the powerful learning ability of convolution neural networks (CNNs) to extract features for head pose estimation. However, a single RGB image does not contain 3D information, thus causing estimation errors by the mapping from 2D space to 3D space.
Driver distraction analysis using face pose cues
2021, Expert Systems with ApplicationsCitation Excerpt :In this architecture there are 5 convolutional layers, 3 fully connected layers and the output has 3 neurons to predict the pose angles. Ahn, Choi, Park, and Kweon (2018) proposed a real-time head pose estimation for driver distraction monitoring using multi-task deep neural network. The trained network was used for face detection, optimal bounding box extraction, and head pose estimation.
Byungtae Ahn received a B.S degree in Electronic Engineering from Kumoh national Institute of Technology, Korea, in 2007, and a M.S. degree in Bio-mechatronics from Sungkyunkwan University, Korea, in 2011. He is currently working toward the Ph.D. degree in Robotics Program at KAIST. He received a Qualcomm Innovation Award in 2013, and has been listed in Marquis Who’s Who in the World, 2016. His research interests include deep learning, Human–Robot Interaction (HRI), and Advanced Driver Assistance System (ADAS). He is a student member of the IEEE.
Dong-Geol Choi received the B.S and M.S degree in Electric Engineering and Computer Science from Hanyang University in 2005 and 2007, respectively, and the Ph.D degrees in the Robotics Program from KAIST in 2016. He is currently a post-doctoral researcher at the Information & Electronics Research Institute, in KAIST. His research interests include sensor fusion, autonomous robotics, and artificial intelligence issues. Dr. Choi received a fellowship award from Qualcomm Korea R&D Center in 2013. He was a member of ‘Team KIAST,’ which won the first place in DARPA Robotics Challenge Finals 2015. He is a member of the IEEE.
Jaesik Park received his Bachelor degree (Summa cum laude) in media communication engineering from Hanyang University in 2009. He received his Master degree and Ph.D. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST) in 2011 and 2015, respectively. He joined Intel Labs as a research scientist in 2015. His research interests include depth map refinement, image-based 3D reconstruction. He is a member of the IEEE.
In So Kweon received the B.S. and M.S. degrees in Mechanical Design and Production Engineering from Seoul National University, Seoul, Korea, in 1981 and 1983, respectively, and the Ph.D. degree in Robotics from the Robotics Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, in 1990. He worked for the Toshiba R&D Center, Japan, and joined the Department of Automation and Design Engineering, KAIST, Seoul, Korea, in 1992, where he is now a professor with the Department of Electrical Engineering. His research interests are sensor fusion, color modeling and analysis, visual tracking, and visual SLAM. He was the general chair for the Asian Conference on Computer Vision 2012 and he is on the honorary board of the International Journal of Computer Vision (IJCV). He has been serving as a director for the Personal Plug and Play DigiCar Center which is one of the National Core Research Center since 2010. He was a member of ‘Team KAIST’ which won the first place in DARPA Robotics Challenge Finals 2015. He is a member of the IEEE.
- 1
This work is done while J. Park was with Robotics and Computer Vision Lab. He is currently with Intel Labs, 2200 Mission College Blvd., Santa Clara, CA 95054-1549, USA.