ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Hu, Yaocong; Shuai, Zhen; Yang, Huicheng; Wan, Guoyang; Zhang, Yajun; Xie, Chao; Lu, Mingqi; Lu, Xiaobo

doi:10.1007/s11042-023-15777-0

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Published: 12 July 2023

Volume 83, pages 18281–18307, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yaocong Hu^1,2,3,
Zhen Shuai^1,2,3,
Huicheng Yang ORCID: orcid.org/0000-0002-5996-503X^1,2,3,
Guoyang Wan^1,2,3,
Yajun Zhang⁴,
Chao Xie⁴,
Mingqi Lu^5,6 &
…
Xiaobo Lu ORCID: orcid.org/0000-0002-7707-7538^5,6

188 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

Conv-Attention: A Low Computation Attention Calculation Method for Swin Transformer

Article Open access 24 February 2024

SCA-YOLO: a new small object detection model for UAV images

Article 25 May 2023

Data Availability

The data that support the findings of this study are not publicly available due to the problem of portraiture right.

References

Abouelnaga Y, Eraqi HM, Moustafa MN (2017) Real-time distracted driver posture classification. arXiv preprint arXiv:1706.09498
Ahmed ST, Basha SM, Ramachandran M, Daneshmand M, Gandomi AH (2023) An edge-ai enabled autonomous connected ambulance route resource recommendation protocol (aca-r3) for ehealth in smart cities. IEEE Internet of Things Journal
Ahmed M, Masood S, Ahmad M, Abd El-Latif AA (2021) Intelligent driver drowsiness detection for traffic safety based on multi cnn deep model and facial subsampling. IEEE Trans Intell Transp Syst 23(10):19 743--19 752
Article Google Scholar
Arnab A, Dehghani M, Heigold G, Sun C, Lučić M, Schmid C (2021) Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6836–6846
Basha SM, Ahmed ST, Iyengar NCSN, Caytiles RD (2021) Inter-locking dependency evaluation schema based on block-chain enabled federated transfer learning for autonomous vehicular systems. In: 2021 Second International Conference on Innovative Technology Convergence (CITC), pp 46–51. IEEE
Boujemaa KS, Berrada I, Fardousse K, Naggar O, Bourzeix F (2021) Toward road safety recommender systems: Formal concepts and technical basics. IEEE Trans Intell Transp Syst, pp 1–20. https://doi.org/10.1109/TITS.2021.3052771
Cao M, Zheng L, Jia W, Liu X (2021) Joint 3d reconstruction and object tracking for traffic video analysis under iov environment. IEEE Trans Intell Transp Syst 22(6):3577–3591. https://doi.org/10.1109/TITS.2020.2995768
Article Google Scholar
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6299–6308
Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE Conf Comput Vis Pattern Recognit, pp 1932–1939. IEEE
Chen LW, Chen HM (2021) Driver behavior monitoring and warning with dangerous driving detection based on the internet of vehicles. IEEE Trans Intell Transp Syst 22(11):7232–7241. https://doi.org/10.1109/TITS.2020.3004655
Article Google Scholar
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Article Google Scholar
Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc
Chen J, Ho CM (2022) Mm-vit: Multi-modal video transformer for compressed video action recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1910–1921
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 2625–2634
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Feichtenhofer C, Pinz A, Wildes R (2016) Spatiotemporal residual networks for video action recognition. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in Neural Information Processing Systems 29:3468–3476. Curran Associates, Inc. http://papers.nips.cc/paper/6433-spatiotemporal-residual-networks-for-video-action-recognition.pdf
Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. In: 2017 IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 7445–7454. https://doi.org/10.1109/CVPR.2017.787
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR)
Feng Y, Sun X, Diao W, Li J, Gao X (2021) Double similarity distillation for semantic image segmentation. IEEE Trans Image Process 30:5363–5376. https://doi.org/10.1109/TIP.2021.3083113
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778
Hinton G, Vinyals O, Dean J et al (2015) Distilling the knowledge in a neural network. arXiv preprint 2(7). arXiv:1503.02531
Hoang Ngan Le T, Zheng Y, Zhu C, Luu K, Savvides M (2016) Multiple scale faster-rcnn approach to driver’s cell-phone usage and hands on steering wheel detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 46–53
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint hyperimagehttp://arxiv.org/abs/1704.04861arXiv:1704.04861
Hu Y, Lu M, Lu X (2018) Driving behaviour recognition from still images by using multi-stream fusion cnn. Mach Vis Appl. https://doi.org/10.1007/s00138-018-0994-z
Article Google Scholar
Hu Y, Lu M, Lu X (2020) Feature refinement for image-based driver action recognition via multi-scale attention convolutional neural network. Signal Process Image Commun 81(115):697. https://doi.org/10.1016/j.image.2019.115697 . http://www.sciencedirect.com/science/article/pii/S0923 596519300980
Hu Y, Lu M, Xie C, Lu X (2021) Video-based driver action recognition via hybrid spatial-temporal deep learning framework. Multimedia Systems 27(3):483–501
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4700–4708
Hu Y, Lu M, Lu X (2018) Spatial-temporal fusion convolutional neural network for simulated driving behavior recognition. In: 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), pp 1271–1277. https://doi.org/10.1109/ICARCV.2018.8581201
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360
Joe Yue-Hei Ng, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: 2015 IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 4694–4702
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proc IEEE Conf Comput Vis Pattern Recognit (CVPR)
Koesdwiady A, Bedawi SM, Ou C, Karray F (2017) End-to-end deep learning for driver distraction recognition. In: Karray F, Campilho A, Cheriet F (eds) Image Analysis and Recognition. Springer International Publishing, Cham, pp 11–18
Chapter Google Scholar
Kopuklu O, Kose N, Gunduz A, Rigoll G (2019) Resource efficient 3d convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp 0–0
Korbar B, Tran D, Torresani L (2019) Scsampler: Sampling salient clips from video for efficient action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in Neural Information Processing Systems, vol 25. Curran Associates Inc
Google Scholar
Kuehne H, Jhuang H, Stiefelhagen R, Serre T (2013) Hmdb51: A large video database for human motion recognition. In: Nagel WE, Kröner DH, Resch MM (eds) High Performance Computing in Science and Engineering 12:571–582. Springer, Berlin Heidelberg, Berlin, Heidelberg
Google Scholar
Liu H, Liu W, Chi Z, Wang Y, Yu Y, Chen J, Jin T (2022) Fast human pose estimation in compressed videos. IEEE Transactions on Multimedia, pp 1–1. https://doi.org/10.1109/TMM.2022.3141888
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Lu M, Hu Y, Lu X (2019) Dilated light-head r-cnn using tri-center loss for driving behavior recognition. Image Vis Comput 90(103):800
Google Scholar
Lu M, Hu Y, Lu X (2020) Driver action recognition using deformable and dilated faster r-cnn with optimized region proposals. Appl Intell 50(4):1100–1111
Article Google Scholar
Maji S, Bourdev L, Malik J (2011) Action recognition from a distributed representation of pose and appearance. In: CVPR 2011, pp 3177–3184. IEEE
Masood S, Rai A, Aggarwal A, Doja MN, Ahmad M (2020) Detecting distraction of drivers using convolutional neural network. Pattern Recogn Lett 139:79–85
Article Google Scholar
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131
Mehta S, Rastegari M (2022) Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. In: International Conference on Learning Representations. https://openreview.net/forum?id=vh-0sUt8HlG
Moslemi N, Azmi R, Soryani M (2019) Driver distraction recognition using 3d convolutional neural networks. In: 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp 145–151. IEEE
National Bureau of Statistics (2021) Traffic accident report. https://data.stats.gov.cn
Peng Y, Zhao Y, Zhang J (2019) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Transactions on Circuits and Systems for Video Technology 29(3):773–786. https://doi.org/10.1109/TCSVT.2018.2808685
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4510–4520
Shou Z, Lin X, Kalantidis Y, Sevilla-Lara L, Rohrbach M, Chang SF, Yan Z (2019) Dmc-net: Generating discriminative motion cues for fast compressed video action recognition. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit, pp 1268–1277
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ (eds) Advances in Neural Information Processing Systems 27. Curran Associates Inc
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR abs/1212.0402. http://arxiv.org/abs/1212.0402
Tomar S (2006) Converting video formats with ffmpeg. Linux journal 2006(146):10
Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6450–6459
Wang P, Cao Y, Shen C, Liu L, Shen HT (2017) Temporal pyramid pooling-based convolutional neural network for action recognition. IEEE Transactions on Circuits and Systems for Video Technology 27(12):2613–2622. https://doi.org/10.1109/TCSVT.2016.2576761
Article Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551–3558
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European conference on computer vision, Springer, pp 20–36
Wu CY, Zaheer M, Hu H, Manmatha R, Smola AJ, Krähenbühl P (2018) Compressed video action recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6026–6035
Yan C, Coenen F, Zhang BL (2014) Driving posture recognition by joint application of motion history image and pyramid histogram of oriented gradients. In: Advances in Mechatronics, Automation and Applied Information Technologies, Advanced Materials Research 846:1102–1105. Trans Tech Publications. https://doi.org/10.4028/www.scientific.net/AMR.846-847.1102
Yang J, Liu J, Han R, Wu J (2021) Generating and restoring private face images for internet of vehicles based on semantic features and adversarial examples. IEEE Trans Intell Transp Syst, pp 1–11. https://doi.org/10.1109/TITS.2021.3102266
Yan C, Zhang B, Coenen F (2015) Driving posture recognition by convolutional neural networks. In: 2015 11th International Conference on Natural Computation (ICNC), pp 680–685. https://doi.org/10.1109/ICNC.2015.7378072
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 1821–1830
Zhang C, Li R, Kim W, Yoon D, Patras P (2020) Driver behavior recognition via interwoven deep convolutional neural nets with multi-stream inputs. Ieee Access 8:191,138--191,151
Article Google Scholar
Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector cnns. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 2718–2726
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6848–6856
Zhao C, Gao Y, He J, Lian J (2012) Recognition of driving postures by multiwavelet transform and multilayer perceptron classifier. Eng Appl Artif Intell 25(8):1677–1686. https://doi.org/10.1016/j.engappai.2012.09.018 . http://www.sciencedirect.com/science/article/pii/S0952 197612002564
Zhao CH, Zhang BL, He J, Lian J (2012) Recognition of driving postures by contourlet transform and random forests. IET Intell Transp Syst 6(2):161–168. https://doi.org/10.1049/iet-its.2011.0116
Article Google Scholar
Zhao CH, Zhang BL, Zhang XZ, Zhao SQ, Li HX (2013) Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers. Neural Comput & Applic 22(1):175–184. https://doi.org/10.1007/s00521-012-1057-4
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE Conf Comput Vis Pattern Recognit (CVPR), pp 6230–6239.https://doi.org/10.1109/CVPR.2017.660
Zhao C, Zhang B, Lian J, He J, Lin T, Zhang X (2011) Classification of driving postures by support vector machines. In: 2011 Sixth International Conference on Image and Graphics, pp 926–930. https://doi.org/10.1109/ICIG.2011.184

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported in part by the National Natural Science Foundation of China (No. 62203012, No. 61871123 and No. 61901221), the Open Research Fund of AnHui Key Laboratory of Detection Technology and Energy Saving Devices (No. JCKJ2022A07), Anhui Polytechnic University of Technology Introduced Talent Research Startup Fund (No. 2022YQQ009) and the Youth Foundation of Anhui Polytechnic University (No. Xjky2022039).

Funding

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and constructive suggestions. This work was supported in part by the National Natural Science Foundation of China (No. 62203012, No. 61871123 and No. 61901221), the Open Research Fund of AnHui Key Laboratory of Detection Technology and Energy Saving Devices (No. JCKJ2022A07), Anhui Polytechnic University of Technology Introduced Talent Research Startup Fund (No. 2022YQQ009) , the Youth Foundation of Anhui Polytechnic University (No. Xjky2022039), Anhui Province Higher Education Quality Engineering Project (No. 2022jyxm139 and No. 2022kcsz027), Anhui University Collaborative Innovation Project (No. GXXT-2020-0069) and Anhui Natural Science Foundation Project (2108085MF220).

Author information

Authors and Affiliations

School of Electrical Engineering, Anhui Polytechnic University, 241000, Wuhu, China
Yaocong Hu, Zhen Shuai, Huicheng Yang & Guoyang Wan
Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, Anhui Polytechnic University, 241000, Wuhu, China
Yaocong Hu, Zhen Shuai, Huicheng Yang & Guoyang Wan
AnHui Key Laboratory of Detection Technology and Energy Saving Devices, AnHui Polytechnic University, 241000, Wuhu, China
Yaocong Hu, Zhen Shuai, Huicheng Yang & Guoyang Wan
College of Mechanical and Electronic Engineering, Nanjing Forestry University, 210037, Nanjing, China
Yajun Zhang & Chao Xie
School of Automation, Southeast University, 210096, Nanjing, China
Mingqi Lu & Xiaobo Lu
Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, 210096, Nanjing, China
Mingqi Lu & Xiaobo Lu

Authors

Yaocong Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Shuai
View author publications
You can also search for this author in PubMed Google Scholar
Huicheng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guoyang Wan
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Mingqi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Huicheng Yang or Xiaobo Lu.

Ethics declarations

Conflict of Interest

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, Y., Shuai, Z., Yang, H. et al. ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems. Multimed Tools Appl 83, 18281–18307 (2024). https://doi.org/10.1007/s11042-023-15777-0

Download citation

Received: 19 November 2022
Revised: 25 March 2023
Accepted: 25 April 2023
Published: 12 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15777-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Conv-Attention: A Low Computation Attention Calculation Method for Swin Transformer

SCA-YOLO: a new small object detection model for UAV images

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

Conv-Attention: A Low Computation Attention Calculation Method for Swin Transformer

SCA-YOLO: a new small object detection model for UAV images

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation