Abstract
Distracted driving is the act of driving while engaged in other activities, such as using a cell phone, texting, eating, or reading, which takes the driver’ attention away from the road. Nowadays, the distracted driving detection models based on deep learning can extract critical information from video data to characterize the driving behavior process. But the distraction driving method based solely on appearance features cannot essentially eliminate the noise impact of the complex environment on the model, and the distracted driving recognition method based solely on skeletal information is unable to recognize the joint action of the human body and the objects. Therefore, the development of an accurate distracted driving detection model has become challenging. In this paper, we propose a distracted driving recognition model MFD-former based on the fusion of posture and appearance. First, a feature extraction module is proposed to extract skeleton data(i.e., posture) and appearance features(i.e., descriptors), which are merged by a graph neural network. Then, the two kinds of information are input into the MFD-former encoder module, and the self-attention mechanism quickly extracts the sparse data. Finally, the classification results of distracted driving are obtained by extracting the classification labels through the MLP Head. The MFD-former model outperforms existing models. It achieved \(95.1\%\) accuracy on the State Farm dataset and \(90.24\%\) accuracy on the self-built Train Drivers dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadal, S., Jain, A., Guirado, R., López-Alonso, J., Alarcón, E.: Computing graph neural networks: a survey from algorithms to accelerators. ACM Comput. Surv. (CSUR) 54(9), 1–38 (2021)
Ahuja, K., Shen, V., Fang, C.M., Riopelle, N., Kong, A., Harrison, C.: Controllerpose: inside-out body capture with VR controller cameras. In: CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2022)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Farm, S.: State farm distracted driver detection. Technical report (2016). https://www. kaggle. com/c/state . . .(2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koesdwiady, A., Bedawi, S.M., Ou, C., Karray, F.: End-to-End deep learning for driver distraction recognition. In: Karray, F., Campilho, A., Cheriet, F. (eds.) ICIAR 2017. LNCS, vol. 10317, pp. 11–18. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59876-5_2
Lemley, J., Bazrafkan, S., Corcoran, P.: Transfer learning of temporal information for driver action classification. In: MAICS, pp. 123–128 (2017)
Moslemi, N., Azmi, R., Soryani, M.: Driver distraction recognition using 3d convolutional neural networks. In: 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 145–151. IEEE (2019)
Moslemi, N., Soryani, M., Azmi, R.: Computer vision-based recognition of driver distraction: a review. Concurrency Comput.: Pract. Experience 33(24), e6475 (2021)
Peng, W., Hong, X., Chen, H., Zhao, G.: Learning graph convolutional network for skeleton-based human action recognition by neural searching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 2669–2676 (2020)
Plizzari, C., Cannici, M., Matteucci, M.: Skeleton-based action recognition via spatial and temporal transformer networks. Comput. Vis. Image Underst. 208, 103219 (2021)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, vol. 30 (2017)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803 (2019)
Acknowledgement
This work was supported by Joint Fund of Natural Science Foundation of Anhui Province in 2020 (2008085UD08), Anhui Provincial Key R &D Program (202004a05020004), Open fund of Intelligent Interconnected Systems Laboratory of Anhui Province (PA2021AKSK0107), Intelligent Networking and New Energy Vehicle Special Project of Intelligent Manufacturing Institute of HFUT (IMIWL2019003, IMIDC2019002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, H. et al. (2022). Posture and Appearance Fusion Network for Driver Distraction Recognition. In: Wang, L., Segal, M., Chen, J., Qiu, T. (eds) Wireless Algorithms, Systems, and Applications. WASA 2022. Lecture Notes in Computer Science, vol 13471. Springer, Cham. https://doi.org/10.1007/978-3-031-19208-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-19208-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19207-4
Online ISBN: 978-3-031-19208-1
eBook Packages: Computer ScienceComputer Science (R0)