Abstract
Driver distraction recognition plays a fundamental role in road safety. In this paper, we present a modular architecture based on the fusion of key points and object detection for predicting driver’s actions. From multi-camera infrared recordings, we will temporarily detect among a variety of actions that lead to distractions. Our system detects objects of interest and extracts key points from the driver. They are merged by generating features that relate them and processed with a ML-based classification algorithm. Finally, filters are applied to reduce bounces and add temporal context to the detections. Our proposal has been validated on two state-of-the-art datasets for driving distractions. Through several experiments we show that fusion substantially improves related action inference and improves domain adaptation. In addition, our framework is lightweight, explainable and has a low latency as it performs frame-by-frame inference. The modularity of the network allows us to upgrade parts independently or eliminate a camera without having to modify the entire network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
La moncloa. 07/01/2022. los accidentes de tráfico se cobraron la vida de 1.004 personas el pasado año [prensa/actualidad/interior]. Accessed 04 Jan 2023
Preliminary 2021 eu road safety statistics. Accessed 04 April 2023
Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification (2018)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Cruz, S.D.D., Wasenmuller, O., Beise, H.P., Stifter, T., Stricker, D.: SVIRO: synthetic vehicle interior rear seat occupancy dataset and benchmark. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 973–982 (2020)
Fang, H.S., et al.: AlphaPose: whole-body regional multi-person pose estimation and tracking in real-time. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 7157–7173 (2022)
Girshick, R., Radosavovic, I., Gkioxari, G., Dollár, P., He, K.: Detectron (2018)
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics (2023). URL https://github.com/ultralytics/ultralytics
Katrolia, J.S., Mirbach, B., El-Sherif, A., Feld, H., Rambach, J., Stricker, D.: TICaM: a time-of-flight in-car cabin monitoring dataset. arXiv preprint arXiv:2103.11719 (2021)
Kiran, B.R., Thomas, D.M., Parakkal, R.: An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos. J. Imaging 4(2), 36 (2018)
Koay, H.V., Chuah, J.H., Chow, C.-O., Chang, Y.-L., Rudrusamy, B.: Optimally-weighted image-pose approach (OWIPA) for distracted driver detection and classification. Sensors 21(14), 4837 (2021). https://doi.org/10.3390/s21144837
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
McNally, W., Vats, K., Wong, A., McPhee, J.: Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation. arXiv preprint arXiv:2111.08557 (2021)
Naphade, M., et al.: The 7th AI city challenge (2023)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10,781–10,790 (2020)
Tran, M.T., Vu, M.Q., Hoang, N.D., Bui, K.H.N.: An effective temporal localization method with multi-view 3D action recognition for untrimmed naturalistic driving videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3168–3173 (2022)
Vats, A., Anastasiu, D.C.: Key point-based driver activity recognition. In: 2022 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022, vol. 1 (2022)
Acknowledgements
This work has been supported from the Spanish PID2021-126623OB-I00 project, funded by MICIN/AEI and FEDER, the TED2021-130131A-I00, PDC2022-133470-I00 projects, funded by MICIN/AEI and the European Union NextGenerationEU/PRTR, and the collaboration scholarship for the 2022–2023 academic year (22C01/007899), financed by the Ministry of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pardo-Decimavilla, P., Bergasa, L.M., López-Guillén, E., Llamazares, Á., Abdeselam, N., Ocaña, M. (2024). Driver Activity Recognition by Fusing Multi-object and Key Points Detection. In: Marques, L., Santos, C., Lima, J.L., Tardioli, D., Ferre, M. (eds) Robot 2023: Sixth Iberian Robotics Conference. ROBOT 2023. Lecture Notes in Networks and Systems, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-031-58676-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-58676-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-58675-0
Online ISBN: 978-3-031-58676-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)