Driver attention prediction based on convolution and transformers

Gou, Chao; Zhou, Yuchen; Li, Dan

doi:10.1007/s11227-021-04151-2

Driver attention prediction based on convolution and transformers

Published: 07 January 2022

Volume 78, pages 8268–8284, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

1556 Accesses
Explore all metrics

Abstract

In recent years, studying how drivers allocate their attention while driving is critical in achieving human-like cognitive ability for autonomous vehicles. And it has been an active topic in the community of human–machine augmented intelligence for self-driving. However, existing state-of-the-art methods for driver attention prediction are mainly built upon convolutional neural network (CNN) with local receptive field which has a limitation to capture the long-range dependencies. In this work, we propose a novel Attention prediction method based on CNN and Transformer which is termed as ACT-Net. In particular, CNN and Transformer are combined as a block which is further stacked to form the deep model. Through this design, both local and long-range dependencies are captured that both are crucial for driver attention prediction. Exhaustive comparison experiments over other state-of-the-art techniques conducted on widely used dataset of BDD-A and private collected data on BDD-X validate the effectiveness of the proposed ACT-Net.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Driver Distraction Detection Using Lightweight Convolution Neural Network with Cheap Multi-scale Features Fusion Block

Recent advancements in driver’s attention prediction

Article 18 May 2024

Predicting Driver Attention in Critical Situations

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Nanning Z, Liu Z, Pengju R, Ma Y, Chen ST, Yu S, Xue J, Chen B, Wang F (2017) Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng 18:153–179
Article Google Scholar
A Tawari, B Kang (2017) A computational framework for drivers visual attention using a fully convolutional architecture. IEEE Intelligent Vehicles Symposium (IV), pp. 887–894
Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733
Article Google Scholar
Y Xia, J Kim, J Canny, K Zipser, T Canas-Bajo, D Whitney (2020) Periphery-fovea multi-resolution driving model guided by human attention. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 1767–1775
A Pal, Mondal S, Christensen H (2020) looking at the right stuff guided semantic-gaze for autonomous driving. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11880–11889
A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, L Kaiser, Illia Polosukhin (2017) Attention is all you need. ArXiv, abs/1706.03762
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH, Zhang L. (2020)Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. ArXiv, abs/2012.15840
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: transformers for image recognition at scale. ArXiv, abs/2010.11929
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y (2021) Transformer in transformer. ArXiv, abs/2103.00112
Deng T, Yan H, Qin L, Ngo T, Manjunath BS (2020) How do drivers allocate their potential attention? driving fixation prediction via convolutional neural networks. IEEE Trans Intell Transp Syst 21(5):2146–2154
Article Google Scholar
Fang J, Yan D, Qiao J, Xue J, Yu H (2021) Dada: driver attention prediction in driving accident scenarios. In: IEEE Transactions on Intelligent Transportation Systems, pp. 1–13
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229
Yan H, Li Z, Li W, Wang C, Wu M, Zhang C (2021) Contnet: Why not use convolution and transformer at the same time? ArXiv, abs/2104.13497
Yang G, Tang H, Ding M, Sebe N, Ricci E (2021)Transformers solve the limited receptive field for monocular depth prediction. ArXiv, abs/2103.12091
Xia Y, Zhang D, Kim J, Nakayama K, Zipser K, Whitney D (2018)Predicting driver attention in critical situations. In: Asian conference on computer vision, pp. 658–674. Springer
Kim J, Rohrbach A, Darrell T, Canny J, Akata Z (2018) Textual explanations for self-driving vehicles. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 577–593,
Moran J, Desimone R (1985) Selective attention gates visual processing in the extrastriate cortex. Science 229(4715):782–4
Article Google Scholar
Alaparthi S, Mishra M (2020) Bidirectional encoder representations from transformers (bert): a sentiment analysis odyssey. ArXiv, abs/2007.01127
Prakash A, Chitta K, Geiger A (2021) Multi-modal fusion transformer for end-to-end autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7077–7087
Yuan Z, Song X, Bai L, Wang Z, Ouyang W (2021) Temporal-channel transformer for 3d lidar-based video object detection for autonomous driving. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2021.3082763
Article Google Scholar
Sheng H, Cai S, Liu Y, Deng B, Huang J, Hua XS, Zhao MJ (2021) Improving 3d object detection with channel-wise transformer. In:Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.2743–2752
Morando A, Victor T, Dozza M (2019) A reference model for driver attention in automation: Glance behavior changes during lateral and longitudinal assistance. IEEE Trans Intell Transp Syst 20:2999–3009
Article Google Scholar
Fang J, Yan D, Qiao J, Xue J, Wang H, Li S (2019) Dada-2000: can driving accident be predicted by driver attention analyzed by a benchmark. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 4303–4309. IEEE
Lv K, Sheng H, Xiong Z, Li W, Zheng L (2020) Improving driver gaze prediction with reinforced attention. IEEE Trans Multimed 23:4198–4207
Article Google Scholar
Deng T, Yan H, Li YJ (2018) Learning to boost bottom-up fixation prediction in driving environments via random forest. IEEE Trans Intell Transp Syst 19(9):3059–3067
Article Google Scholar
Tawari A, Mallela P, Martin S (2018) Learning to attend to salient targets in driving videos using fully convolutional rnn. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3225–3232. IEEE
Lateef F, Kas M, Ruichek Y (2021) Saliency heat-map as visual attention for autonomous driving using generative adversarial network (gan). IEEE Trans Intell Transp Syst. https://doi.org/10.1109/TITS.2021.3053178
Article Google Scholar
Shirpour M, Beauchemin S, Bauer M (2021) Driver’s eye fixation prediction by deep neural network. In: VISIGRAPP (4: VISAPP), pp. 67–75
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Article Google Scholar
Xu H, Gao Y, Yu F, Darrell T (2017) End-to-end learning of driving models from large-scale video datasets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3530–3538
Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. ArXiv, abs/1805.04687
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271
Meur O, Baccino T (2013) Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav Res Methods 45:251–266
Article Google Scholar
Wang W, Shen J, Xie J, Cheng MM, Ling H, Borji A (2021) Revisiting video saliency prediction in the deep learning era. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 43:220–237
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. In: Neural Information Processing Systems (NIPS), pp. 545–552
Huang X, Shen C, Boix X, Zhao Q (2015) Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 262–270

Download references

Acknowledgements

Project supported by the National Key R&D Program of China (No.2020YFB1600400), National Natural Science Foundation of China (No.61806198), the Key Research and Development Program of Guangzhou (No.202007050002), Shenzhen Science and Technology Program (Grant No.RCBS20200714114920272).

Author information

Authors and Affiliations

Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, 518107, China
Chao Gou & Yuchen Zhou
School of Intelligent Systems Engineering, Sun Yat-Sen University, Guangzhou, 510275, China
Chao Gou & Yuchen Zhou
Zhuhai Campus of Sun Yat-sen University, Sun Yat-sen University, Zhuhai, 519082, China
Dan Li

Authors

Chao Gou
View author publications
You can also search for this author inPubMed Google Scholar
Yuchen Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Dan Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dan Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gou, C., Zhou, Y. & Li, D. Driver attention prediction based on convolution and transformers. J Supercomput 78, 8268–8284 (2022). https://doi.org/10.1007/s11227-021-04151-2

Download citation

Accepted: 15 October 2021
Published: 07 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11227-021-04151-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Driver attention prediction based on convolution and transformers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Driver Distraction Detection Using Lightweight Convolution Neural Network with Cheap Multi-scale Features Fusion Block

Recent advancements in driver’s attention prediction

Predicting Driver Attention in Critical Situations

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now