StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios

Ge, Weizhen; Wang, Xin; Mao, Zhaoyong; Ren, Jing; Shen, Junge

doi:10.1007/s10489-024-05748-9

StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios

Published: 10 September 2024

Volume 54, pages 12177–12193, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Weizhen Ge¹,
Xin Wang^1,2,
Zhaoyong Mao¹,
Jing Ren³ &
…
Junge Shen ORCID: orcid.org/0000-0002-6563-9206¹

205 Accesses
Explore all metrics

Abstract

Streaming perception is a crucial task in the field of autonomous driving, which aims to eliminate the inconsistency between the perception results and the real environment due to the delay. In high-speed driving scenarios, the inconsistency becomes larger. Previous research has ignored the study of streaming perception in high-speed driving scenarios and the robustness of the model to object’s speed. To fill this gap, we first define the full-speed domain streaming perception problem and construct a real-time meta-detector, StreamTrack. Second, to perform motion trend extraction, Swift Multi-Cost Tracker (SMCT) is proposed for fast and accurate data association. Meanwhile, the Direct-Decoupled Prediction Head (DDPH) is introduced for predicting future locations. Furthermore, we introduce the Uniform Motion Prior Loss (UMPL), which ensures stable learning of the model for rapidly moving objects. Compared with the strong baseline, our model improves the SAsAP (Speed-Adaptive steaming Average Precision) by 15.46 %. Extensive experiments show that our approach achieves state-of-the-art performance in the full-speed domain streaming perception task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

Context-Aware Streaming Perception in Dynamic Environments

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during or analysed during the current study are available from the corresponding author on reasonable request.

References

Li M, Wang YX, Ramanan D (2020) Towards streaming perception. In: Computer vision–ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, Springer, pp 473–488
Yang J, Liu S, Li Z, et al (2022) Real-time object detection for streaming perception. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5385–5395
Li C, Cheng ZQ, He JY, et al (2023) Longshortnet: Exploring temporal and semantic features fusion in streaming perception. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1–5
He JY, Cheng ZQ, Li C, et al (2023) Damo-streamnet: Optimizing streaming perception in autonomous driving. In: Elkind E (ed) proceedings of the thirty-second international joint conference on artificial intelligence, IJCAI-23. International Joint Conferences on Artificial Intelligence Organization, pp 810–818. https://doi.org/10.24963/ijcai.2023/90, main Track
Wang X, Zhu Z, Zhang Y, et al (2023) Are we ready for vision-centric driving streaming perception? the asap benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9600–9610
Sela GE, Gog I, Wong J, et al (2022) Context-aware streaming perception in dynamic environments. In: European conference on computer vision, Springer, pp 621–638
Thavamani C, Li M, Cebron N, et al (2021) Fovea: Foveated image magnification for autonomous navigation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15539–15548
Ghosh A, Nambi A, Singh A, et al (2021) Adaptive streaming perception using deep reinforcement learning. arXiv:2106.05665
Gu Y, Wang Q, Qin X (2021) Real-time streaming perception system for autonomous driving. In: 2021 China Automation Congress (CAC), IEEE, pp 5239–5244
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Zheng Y, Huang D, Liu S, et al (2020) Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13766–13775
Rajaram RN, Ohn-Bar E, Trivedi MM (2016) Refinenet: Refining object detectors for autonomous driving. IEEE Trans Intell Veh 1(4):358–368. https://doi.org/10.1109/TIV.2017.2695896
Article Google Scholar
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer vision–ECCV 2016: 14th european conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37
Tian Z, Shen C, Chen H, et al (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934
glenn jocher (2021) Yolov5. https://github.com/ultralytics/yolov5
Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976
Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430
Zhang J, Shi Y, Yang J et al (2024) Kd-scfnet: Towards more accurate and lightweight salient object detection via knowledge distillation. Neurocomputing 572:127206
Article Google Scholar
Ju P, Zhang Y (2024) Knowledge distillation for object detection based on inconsistency-based feature imitation and global relation imitation. Neurocomputing 566:127060
Article Google Scholar
Liu Z, Zheng T, Xu G et al (2021) Ttfnext for real-time object detection. Neurocomputing 433:59–70
Article Google Scholar
Wu H, Ma D, Mao Z et al (2022) Ssrfd: single shot real-time face detector. Appl Intell 52(10):11916–11927
Article Google Scholar
Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. Signal, Image and Video Processing, pp 1–10
Zhao Y, Lv W, Xu S, et al (2024) Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16965–16974
Yu T, Zhang C, Ma M et al (2023) Recursive least squares method for training and pruning convolutional neural networks. Appl Intell 53(20):24603–24618
Article Google Scholar
Bakkouri I, Afdel K (2018) Convolutional neural-adaptive networks for melanoma recognition. In: Image and signal processing: 8th international conference, ICISP 2018, Cherbourg, France, July 2-4, 2018, Proceedings 8, Springer, pp 453–460
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 3645–3649
Yang J, Ge H, Su S, et al (2022) Transformer-based two-source motion model for multi-object tracking. Appl Intell pp 1–13
Zhang J, Zhou S, Chang X, et al (2020) Multiple object tracking by flowing and fusing. arXiv:2001.11180
Dosovitskiy A, Fischer P, Ilg E, et al (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766
Xu J, Cao Y, Zhang Z, et al (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998
Liu S, Yu H, Liao C, et al (2021) Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations
Wu H, Xu J, Wang J et al (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Google Scholar
Zeng A, Chen M, Zhang L, et al (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, pp 11121–11128
Nie Y, Nguyen NH, Sinthong P, et al (2022) A time series is worth 64 words: Long-term forecasting with transformers. arXiv:2211.14730
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

Download references

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities under Grant D5000220192, Shaanxi Natural Science Basic Research Program under Grant 2022JM-206, Xi’an Science and Technology Planning Project under Grant 21RGZN0008, Shaanxi Province Qin Chuang Yuan "scientists + engineers" team construction under Grant 2022KXJ-006, and Technology Innovation Guidance Project of Shaanxi Province under Grant 2023GXLH-100.

Author information

Authors and Affiliations

Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, China
Weizhen Ge, Xin Wang, Zhaoyong Mao & Junge Shen
Shaanxi Transportation Holding Group, Xi’an, 710068, Shaanxi, China
Xin Wang
Singapore University of Social Sciences, Clementi, 599494, Singapore
Jing Ren

Authors

Weizhen Ge
View author publications
You can also search for this author inPubMed Google Scholar
Xin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Zhaoyong Mao
View author publications
You can also search for this author inPubMed Google Scholar
Jing Ren
View author publications
You can also search for this author inPubMed Google Scholar
Junge Shen
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Weizhen Ge: Conceptualization of this study, Methodology, Experiments, Writing - Original draft preparation. Junge Shen: Funding application, Manuscript revision, Writing - Review. Xin Wang: Funding application, Manuscript revision. Zhaoyong Mao: Funding application, Manuscript revision. Jing Ren: Manuscript revision.

Corresponding author

Correspondence to Junge Shen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Ethical and informed consent for data used

This study does not involve the collection or use of private data. All data used in this study were obtained from the open literature, statistics or results of simulation experiments. Therefore, ethical approval and informed consent were not required for this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ge, W., Wang, X., Mao, Z. et al. StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios. Appl Intell 54, 12177–12193 (2024). https://doi.org/10.1007/s10489-024-05748-9

Download citation

Accepted: 04 August 2024
Published: 10 September 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s10489-024-05748-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

Context-Aware Streaming Perception in Dynamic Environments

Motion Inspired Unsupervised Perception and Prediction in Autonomous Driving

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now