Abstract
Streaming perception is a crucial task in the field of autonomous driving, which aims to eliminate the inconsistency between the perception results and the real environment due to the delay. In high-speed driving scenarios, the inconsistency becomes larger. Previous research has ignored the study of streaming perception in high-speed driving scenarios and the robustness of the model to object’s speed. To fill this gap, we first define the full-speed domain streaming perception problem and construct a real-time meta-detector, StreamTrack. Second, to perform motion trend extraction, Swift Multi-Cost Tracker (SMCT) is proposed for fast and accurate data association. Meanwhile, the Direct-Decoupled Prediction Head (DDPH) is introduced for predicting future locations. Furthermore, we introduce the Uniform Motion Prior Loss (UMPL), which ensures stable learning of the model for rapidly moving objects. Compared with the strong baseline, our model improves the SAsAP (Speed-Adaptive steaming Average Precision) by 15.46 %. Extensive experiments show that our approach achieves state-of-the-art performance in the full-speed domain streaming perception task.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets generated during or analysed during the current study are available from the corresponding author on reasonable request.
References
Li M, Wang YX, Ramanan D (2020) Towards streaming perception. In: Computer vision–ECCV 2020: 16th european conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, Springer, pp 473–488
Yang J, Liu S, Li Z, et al (2022) Real-time object detection for streaming perception. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5385–5395
Li C, Cheng ZQ, He JY, et al (2023) Longshortnet: Exploring temporal and semantic features fusion in streaming perception. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1–5
He JY, Cheng ZQ, Li C, et al (2023) Damo-streamnet: Optimizing streaming perception in autonomous driving. In: Elkind E (ed) proceedings of the thirty-second international joint conference on artificial intelligence, IJCAI-23. International Joint Conferences on Artificial Intelligence Organization, pp 810–818. https://doi.org/10.24963/ijcai.2023/90, main Track
Wang X, Zhu Z, Zhang Y, et al (2023) Are we ready for vision-centric driving streaming perception? the asap benchmark. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9600–9610
Sela GE, Gog I, Wong J, et al (2022) Context-aware streaming perception in dynamic environments. In: European conference on computer vision, Springer, pp 621–638
Thavamani C, Li M, Cebron N, et al (2021) Fovea: Foveated image magnification for autonomous navigation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15539–15548
Ghosh A, Nambi A, Singh A, et al (2021) Adaptive streaming perception using deep reinforcement learning. arXiv:2106.05665
Gu Y, Wang Q, Qin X (2021) Real-time streaming perception system for autonomous driving. In: 2021 China Automation Congress (CAC), IEEE, pp 5239–5244
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Lin TY, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Zheng Y, Huang D, Liu S, et al (2020) Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13766–13775
Rajaram RN, Ohn-Bar E, Trivedi MM (2016) Refinenet: Refining object detectors for autonomous driving. IEEE Trans Intell Veh 1(4):358–368. https://doi.org/10.1109/TIV.2017.2695896
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: Single shot multibox detector. In: Computer vision–ECCV 2016: 14th european conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21–37
Tian Z, Shen C, Chen H, et al (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Redmon J, Divvala S, Girshick R, et al (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934
glenn jocher (2021) Yolov5. https://github.com/ultralytics/yolov5
Li C, Li L, Jiang H, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv:2209.02976
Wang CY, Bochkovskiy A, Liao HYM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Ge Z, Liu S, Wang F, et al (2021) Yolox: Exceeding yolo series in 2021. arXiv:2107.08430
Zhang J, Shi Y, Yang J et al (2024) Kd-scfnet: Towards more accurate and lightweight salient object detection via knowledge distillation. Neurocomputing 572:127206
Ju P, Zhang Y (2024) Knowledge distillation for object detection based on inconsistency-based feature imitation and global relation imitation. Neurocomputing 566:127060
Liu Z, Zheng T, Xu G et al (2021) Ttfnext for real-time object detection. Neurocomputing 433:59–70
Wu H, Ma D, Mao Z et al (2022) Ssrfd: single shot real-time face detector. Appl Intell 52(10):11916–11927
Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. Signal, Image and Video Processing, pp 1–10
Zhao Y, Lv W, Xu S, et al (2024) Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16965–16974
Yu T, Zhang C, Ma M et al (2023) Recursive least squares method for training and pruning convolutional neural networks. Appl Intell 53(20):24603–24618
Bakkouri I, Afdel K (2018) Convolutional neural-adaptive networks for melanoma recognition. In: Image and signal processing: 8th international conference, ICISP 2018, Cherbourg, France, July 2-4, 2018, Proceedings 8, Springer, pp 453–460
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 3645–3649
Yang J, Ge H, Su S, et al (2022) Transformer-based two-source motion model for multi-object tracking. Appl Intell pp 1–13
Zhang J, Zhou S, Chang X, et al (2020) Multiple object tracking by flowing and fusing. arXiv:2001.11180
Dosovitskiy A, Fischer P, Ilg E, et al (2015) Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 2758–2766
Xu J, Cao Y, Zhang Z, et al (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3988–3998
Liu S, Yu H, Liao C, et al (2021) Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations
Wu H, Xu J, Wang J et al (2021) Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv Neural Inf Process Syst 34:22419–22430
Zeng A, Chen M, Zhang L, et al (2023) Are transformers effective for time series forecasting? In: Proceedings of the AAAI conference on artificial intelligence, pp 11121–11128
Nie Y, Nguyen NH, Sinthong P, et al (2022) A time series is worth 64 words: Long-term forecasting with transformers. arXiv:2211.14730
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Acknowledgements
This work was supported by the Fundamental Research Funds for the Central Universities under Grant D5000220192, Shaanxi Natural Science Basic Research Program under Grant 2022JM-206, Xi’an Science and Technology Planning Project under Grant 21RGZN0008, Shaanxi Province Qin Chuang Yuan "scientists + engineers" team construction under Grant 2022KXJ-006, and Technology Innovation Guidance Project of Shaanxi Province under Grant 2023GXLH-100.
Author information
Authors and Affiliations
Contributions
Weizhen Ge: Conceptualization of this study, Methodology, Experiments, Writing - Original draft preparation. Junge Shen: Funding application, Manuscript revision, Writing - Review. Xin Wang: Funding application, Manuscript revision. Zhaoyong Mao: Funding application, Manuscript revision. Jing Ren: Manuscript revision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Ethical and informed consent for data used
This study does not involve the collection or use of private data. All data used in this study were obtained from the open literature, statistics or results of simulation experiments. Therefore, ethical approval and informed consent were not required for this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ge, W., Wang, X., Mao, Z. et al. StreamTrack: real-time meta-detector for streaming perception in full-speed domain driving scenarios. Appl Intell 54, 12177–12193 (2024). https://doi.org/10.1007/s10489-024-05748-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05748-9