Skip to main content
Log in

StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Streaming perception is a crucial task in the field of autonomous driving, which aims to eliminate the inconsistency between the perception results and the real environment due to the delay. In high-speed driving scenarios, the inconsistency becomes larger. Previous research has ignored the study of streaming perception in high-speed driving scenarios and the robustness of the model to object’s speed. To fill this gap, we first define the full-speed domain streaming perception problem and construct a real-time meta-detector, StreamTrack. Second, to perform motion trend extraction, Swift Multi-Cost Tracker (SMCT) is proposed for fast and accurate data association. Meanwhile, the Direct-Decoupled Prediction Head (DDPH) is introduced for predicting future locations. Furthermore, we introduce the Uniform Motion Prior Loss (UMPL), which ensures stable learning of the model for rapidly moving objects. Compared with the strong baseline, our model improves the SAsAP (Speed-Adaptive steaming Average Precision) by 15.46 %. Extensive experiments show that our approach achieves state-of-the-art performance in the full-speed domain streaming perception task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Algorithm 2
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated during or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Li M, Wang YX, Ramanan D (2020) Towards streaming perception. In: Computer vision–ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, Springer, pp 473–488

  2. Yang J, Liu S, Li Z, et al (2022) Real-time object detection for streaming perception. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5385–5395

  3. Li C, Cheng ZQ, He JY, et al (2023) Longshortnet: Exploring temporal and semantic features fusion in streaming perception. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1–5

  4. He JY, Cheng ZQ, Li C, et al (2023) Damo-streamnet: Optimizing streaming perception in autonomous driving. In: Elkind E (ed) proceedings of the thirty-second international joint conference on artificial intelligence, IJCAI-23. International Joint Conferences on Artificial Intelligence Organization, pp 810–818. https://doi.org/10.24963/ijcai.2023/90, main Track

  5. Wang X, Zhu Z, Zhang Y, et al (2023) Are we ready for vision-centric driving streaming perception? the asap benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9600–9610

  6. Sela GE, Gog I, Wong J, et al (2022) Context-aware streaming perception in dynamic environments. In: European conference on computer vision, Springer, pp 621–638

  7. Thavamani C, Li M, Cebron N, et al (2021) Fovea: Foveated image magnification for autonomous navigation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15539–15548

  8. Ghosh A, Nambi A, Singh A, et al (2021) Adaptive streaming perception using deep reinforcement learning. arXiv:2106.05665

  9. Gu Y, Wang Q, Qin X (2021) Real-time streaming perception system for autonomous driving. In: 2021 China Automation Congress (CAC), IEEE, pp 5239–5244

  10. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  11. Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  12. Zheng Y, Huang D, Liu S, et al (2020) Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13766–13775

  13. Rajaram RN, Ohn-Bar E, Trivedi MM (2016) Refinenet: Refining object detectors for autonomous driving. IEEE Trans Intell Veh 1(4):358–368. https://doi.org/10.1109/TIV.2017.2695896

    Article  Google Scholar 

  14. Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer vision–ECCV 2016: 14th european conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37

  15. Tian Z, Shen C, Chen H, et al (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636

  16. Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  17. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  18. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  19. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934

  20. glenn jocher (2021) Yolov5. https://github.com/ultralytics/yolov5

  21. Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976

  22. Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475

  23. Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430

  24. Zhang J, Shi Y, Yang J et al (2024) Kd-scfnet: Towards more accurate and lightweight salient object detection via knowledge distillation. Neurocomputing 572:127206

    Article  Google Scholar 

  25. Ju P, Zhang Y (2024) Knowledge distillation for object detection based on inconsistency-based feature imitation and global relation imitation. Neurocomputing 566:127060

    Article  Google Scholar 

  26. Liu Z, Zheng T, Xu G et al (2021) Ttfnext for real-time object detection. Neurocomputing 433:59–70

    Article  Google Scholar 

  27. Wu H, Ma D, Mao Z et al (2022) Ssrfd: single shot real-time face detector. Appl Intell 52(10):11916–11927

    Article  Google Scholar 

  28. Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. Signal, Image and Video Processing, pp 1–10

  29. Zhao Y, Lv W, Xu S, et al (2024) Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16965–16974

  30. Yu T, Zhang C, Ma M et al (2023) Recursive least squares method for training and pruning convolutional neural networks. Appl Intell 53(20):24603–24618

    Article  Google Scholar 

  31. Bakkouri I, Afdel K (2018) Convolutional neural-adaptive networks for melanoma recognition. In: Image and signal processing: 8th international conference, ICISP 2018, Cherbourg, France, July 2-4, 2018, Proceedings 8, Springer, pp 453–460

  32. Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 3645–3649

  33. Yang J, Ge H, Su S, et al (2022) Transformer-based two-source motion model for multi-object tracking. Appl Intell pp 1–13

  34. Zhang J, Zhou S, Chang X, et al (2020) Multiple object tracking by flowing and fusing. arXiv:2001.11180

  35. Dosovitskiy A, Fischer P, Ilg E, et al (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766

  36. Xu J, Cao Y, Zhang Z, et al (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998

  37. Liu S, Yu H, Liao C, et al (2021) Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations

  38. Wu H, Xu J, Wang J et al (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430

    Google Scholar 

  39. Zeng A, Chen M, Zhang L, et al (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, pp 11121–11128

  40. Nie Y, Nguyen NH, Sinthong P, et al (2022) A time series is worth 64 words: Long-term forecasting with transformers. arXiv:2211.14730

  41. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  42. He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities under Grant D5000220192, Shaanxi Natural Science Basic Research Program under Grant 2022JM-206, Xi’an Science and Technology Planning Project under Grant 21RGZN0008, Shaanxi Province Qin Chuang Yuan "scientists + engineers" team construction under Grant 2022KXJ-006, and Technology Innovation Guidance Project of Shaanxi Province under Grant 2023GXLH-100.

Author information

Authors and Affiliations

Authors

Contributions

Weizhen Ge: Conceptualization of this study, Methodology, Experiments, Writing - Original draft preparation. Junge Shen: Funding application, Manuscript revision, Writing - Review. Xin Wang: Funding application, Manuscript revision. Zhaoyong Mao: Funding application, Manuscript revision. Jing Ren: Manuscript revision.

Corresponding author

Correspondence to Junge Shen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical and informed consent for data used

This study does not involve the collection or use of private data. All data used in this study were obtained from the open literature, statistics or results of simulation experiments. Therefore, ethical approval and informed consent were not required for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ge, W., Wang, X., Mao, Z. et al. StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios. Appl Intell 54, 12177–12193 (2024). https://doi.org/10.1007/s10489-024-05748-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05748-9

Keywords