Skip to main content
Log in

Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Ascertaining the precise detection of traffic objects, drivable areas, and lane lines is the foremost obligation for an automatic driving system. The efficient execution of these tasks is made arduous by the intricate and dynamic nature of the driving environment, as well as varying lighting conditions. This paper proposes EHSINet, an efficient and versatile neural network architecture that adaptively addresses multiple tasks, including traffic object detection, drivable area segmentation, and lane line segmentation. ESHINet is based on the fully convolutional network, enabling the long-range and high-order spatial interactions between neighborhood features without using vision transformers. The performance of EHSINet is evaluated on the BDD100K and KITTI datasets, where it demonstrates superior performance, especially in complex environments, illumination changes, and severe weather conditions. EHSINet outperforms state-of-the-art methods on public datasets. The test results in real-world scenarios also demonstrate its strong generalization and practical value. Code is available at https://github.com/Pepper-FlavoredChewingGum/EHSINet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availibility

The code and datasets analysed during the current study are available in the [EHSINet] repository, [https://github.com/Pepper-FlavoredChewingGum/EHSINet].

References

  1. Chen L, Li Y, Huang C et al (2023) Milestones in autonomous driving and intelligent vehicles: survey of surveys. IEEE Trans Intell Veh 8(2):1046–1056. https://doi.org/10.1109/TIV.2022.3223131

    Article  Google Scholar 

  2. Angelos M, Chandra R, Manocha D (2020) B-gap: Behavior-rich simulation and navigation for autonomous driving. IEEE Robot Autom Lett 7(4):4718–4725. https://doi.org/10.1109/lra.2022.3152594

    Article  Google Scholar 

  3. Wu D, Liao M, Zhang W-T et al (2021) Yolop: You only look once for panoptic driving perception. Mach Intell Res 19(8):550–562. https://doi.org/10.1007/s11633-022-1339-y

    Article  Google Scholar 

  4. Qiu Z, Zhao J, Sun S (2022) Mfialane: Multiscale feature information aggregator network for lane detection. IEEE Trans Intell Transport Syst 23(12):24263–24275. https://doi.org/10.1109/TITS.2022.3195742

    Article  Google Scholar 

  5. Qian Y, Dolan JM, Yang M (2020) Dlt-net: Joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transp Syst 21(11):4670–4679. https://doi.org/10.1109/TITS.2019.2943777

    Article  Google Scholar 

  6. Xu Y, Pedram G (2022) Consistency-regularized region-growing network for semantic segmentation of urban scenes with point-level annotations. IEEE Trans Imag Process 31(7):5038–5051. https://doi.org/10.1109/TIP.2022.3189825

    Article  Google Scholar 

  7. Chen K, Hong L, Xu H, et al. (2021) MultiSiam: Self-supervised multi-instance siamese representation learning for autonomous driving. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October . https://doi.org/10.1109/ICCV48922.2021.00745

  8. Shu F, Xie Y, Rambach J, et al. (2021) Visual SLAM with graph-cut optimized multi-plane reconstruction. Paper presented at the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct, Osaka, Japan, 22–27 August. https://doi.org/10.1109/ISMAR-Adjunct54149.2021.00042

  9. Chen K, Li L, Liu H, et al. (2023) SwinFSR: Stereo image super-resolution using SwinIR and Frequency Domain Knowledge. Paper presented at the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 20–22 June. https://doi.org/10.48550/arXiv.2304.12556

  10. Hinzmann T, Stastny T, Cadena C et al (2018) Free lsd: Prior-free visual landing site detection for autonomous planes. IEEE Robot Autom Lett 3(4):2545–2552. https://doi.org/10.1109/LRA.2018.2809962

    Article  Google Scholar 

  11. Eraqi HM, Moustafa MN, Honer J (2022) Dynamic conditional imitation learning for autonomous driving. IEEE Trans Intell Transp Syst 23(12):22988–23001. https://doi.org/10.1109/TITS.2022.3214079

    Article  Google Scholar 

  12. Liu Q, Han T, Xie J et al (2022) Real-time dynamic map with crowdsourcing vehicles in edge computing. IEEE Trans Intell Veh 6(4):1–10. https://doi.org/10.1109/TIV.2022.3214119

    Article  Google Scholar 

  13. Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  14. Li C, Li L, Jiang H, et al. (2022) YOLOv6: A single-stage object detection framework for Industrial Applications. Preprint at arXiv:2209.02976

  15. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint at arXiv:2207.02696

  16. Wang Y, Peng Y, Li W et al (2022) Ddu-net: Dual-decoder-u-net for road extraction using high-resolution remote sensing images. IEEE Trans Geosci Remote Sens 60(12):1–12. https://doi.org/10.1109/TGRS.2022.3197546

    Article  Google Scholar 

  17. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Patt Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  18. Pan X, Shi J, Luo P, et al. (2018) Spatial As Deep: Spatial CNN for Traffic Scene Understanding. Paper presented at the 2018 Association for the Advancement of Artificial Intelligence, New Orleans, USA, 2–7 February. https://doi.org/10.1609/aaai.v32i1.12301

  19. Tran DN-N, Pham LH, Nguyen H-H et al (2022) Universal detection-based driving assistance using a mono camera with jetson devices. IEEE Access 9(6):1–13. https://doi.org/10.1109/ACCESS.2022.3179999

    Article  Google Scholar 

  20. Yu F, Chen H, Wang X, et al. (2018) BDD100K: A diverse driving dataset for heterogeneous multitask learning. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/cvpr42600.2020.00271

  21. Andreas G, Philip L, Christoph S et al (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  22. Ren S, He K, Girshick R et al (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  23. He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(12):1904–1916. https://doi.org/10.1007/978-3-319-10578-9_23

    Article  Google Scholar 

  24. Cai Z, Vasconcelos N (2018) Cascade R-CNN: Delving into high quality object detection. Paper presented at the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 16–20 June. https://doi.org/10.1109/CVPR.2018.00644

  25. Liu W, Anguelov D, Erhan D et al. (2016) SSD: Single Shot MultiBox Detector. Paper presented at the 2016 European Conference on Computer Vision, Amsterdam, Netherlands 10–16 October. https://doi.org/10.1007/978-3-319-46448-0_2

  26. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. Preprint at arXiv:2004.10934

  27. Wang J, Chen Y, Dong Z et al (2021) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(4):7853–7865. https://doi.org/10.1007/s00521-022-08077-5

    Article  Google Scholar 

  28. Benjumea A, Teeti I, Cuzzolin F, et al. (2021) YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicle. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 11–17 October. arXiv:2112.11798

  29. Ge Z, Liu S, Wang F, et al. (2021) YOLOX: Exceeding YOLO Series in 2021. Preprint at arxiv:2107.08430

  30. Zhu X, Su W, Lu L et al. (2020) Deformable DETR: Deformable Transformers for End-to-End Object Detection. Preprint at arXiv:2010.04159

  31. Wang X, Wang D, Li S, et al. (2022) Low-light traffic objects detection for automated vehicles. Paper presented at the 2022 CAA International Conference on Vehicular Control and Intelligence, Nanjing, China, 28–30 October. https://doi.org/10.1109/CVCI56766.2022.9964586

  32. Yang K, Liu J, Yang D, et al. (2022) A novel efficient multi-view traffic-related object detection framework. Paper presented at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Rhodes Island, Greece, 04–10 June. https://doi.org/10.1109/ICASSP49357.2023.10095027

  33. Shi Y, Wu J, Zhao S, et al. (2022) Rethinking the detection head configuration for traffic object detection. Preprint at arXiv:2210.03883

  34. Liu Z, Hu H, Lin Y, et al. (2022) Swin transformer v2: Scaling up capacity and resolution. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01170

  35. Li F, Zhang H, Liu S, et al. (2022) DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June . https://doi.org/10.1109/CVPR52688.2022.01325

  36. Zhao H, Shi J, Qi X, et al. (2017) Pyramid scene parsing network. Paper presented at the 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 21–26 July. https://doi.org/10.1109/CVPR.2017.660

  37. Adam P, Abhishek C, Sangpil K, et al. (2016) Enet: A deep neural network architecture for real-time semantic segmentation. Preprint at arXiv:1606.02147

  38. Chen L-C, Papandreou G, Kokkinos I et al (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Patt Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  Google Scholar 

  39. Hou Y, Ma Z, Liu C, et al. (2019) Learning lightweight lane detection CNNs by self attention distillation. Paper presented at the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea 27 October–2 November. https://doi.org/10.1109/ICCV.2019.00110

  40. Strudel R, Pinel RG, Laptev I, et al. (2021) Segmenter: transformer for semantic segmentation. Paper presented at the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada 11–17 October. https://doi.org/10.1109/ICCV48922.2021.00717

  41. Xie E, Wang W, Yu Z, et al. (2021) SegFormer: simple and efficient design for semantic segmentation with transformers. Preprint at arXiv:2105.15203

  42. Yu Q, Wang H, Kim D, et al. (2022) CMT-DeepLab: clustering mask transformers for panoptic segmentation. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.00259

  43. Rao Y, Zhao W, Tang Y, et al. (2022) HorNet: Efficient high-order spatial interactions with recursive gated convolutions. Preprint at arXiv:2207.14284

  44. Kong T, Sun F, Yao A, et al. (2017) RON: Reverse connection with objectness prior networks for object detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July. https://doi.org/10.1109/CVPR.2017.557

  45. Zhang S, Wen L, Bian X, et al. (2018) Single-shot refinement neural network for object detection. Paper presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA 18–21 June. https://doi.org/10.1109/CVPR.2018.00442

  46. Ghiasi G, Lin T-Y, Pang, R, et al. (2019) NAS-FPN: Learning scalable feature architecture for object detection. Paper presented at the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA 15–21 June. https://doi.org/10.1109/CVPR.2019.00720

  47. Qiu H, Yu B, Tao D (2022) GFNet: Geometric flow network for 3D point cloud semantic segmentation. Preprint at arXiv:2207.02605

  48. Ding X, Zhang X, Zhou Y, et al. (2022) Scaling Up Your Kernels to 31\(\times \)31: Revisiting Large Kernel Design in CNNs. Paper presented at the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA 19–24 June. https://doi.org/10.1109/CVPR52688.2022.01166

  49. Woo S, Debnath S, Hu R, et al. (2023) ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders. Preprint at arXiv:2301.00808

  50. Lin T-Y, Dollár P, Girshick RB, et al. (2017) Feature Pyramid Networks for Object Detection. Paper presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA 21–26 July . https://doi.org/10.1109/CVPR.2017.106

  51. Marvin T, Michael W, Marius Z, et al. (2018) Multinet: Real-time joint semantic reasoning for autonomous driving. Paper presented at the 2018 IEEE intelligent vehicles symposium, Suzhou, China 26–30 June . https://doi.org/10.1109/IVS.2018.8500504

  52. Joseph R, Santosh D, Ross G, et al. (2016) You only look once: Unified, real-time object detection. Paper presented at the 2016 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 26 June–01 July. https://doi.org/10.1109/CVPR.2016.91

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

The project design and paper writing were carried out by all authors. JY conceived the core ideas and principles of the paper. YL designed the experiments, trained the models, and wrote the paper. CL collected the data, while RT processed and documented it. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jianjun Yao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Informed Consent

The research did not involves human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, J., Li, Y., Liu, C. et al. Ehsinet: Efficient High-Order Spatial Interaction Multi-task Network for Adaptive Autonomous Driving Perception. Neural Process Lett 55, 11353–11370 (2023). https://doi.org/10.1007/s11063-023-11379-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-023-11379-x

Keywords

Navigation