skip to main content
10.1145/3503161.3548162acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

Published: 10 October 2022 Publication History

Abstract

Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution for deploying MOT methods on edge devices with limited computing, storage, power, and transmitting bandwidth. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give correct displacement estimations of objects whose visibility flip within adjacent frames. To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Then we design a two-stage association policy where displacement estimations or historical motion cues are leveraged in the corresponding stage according to APP predictions. Our method, with little additional computational overhead, shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

Supplementary Material

MP4 File (MM22-fp1699.mp4)
Presentation video

References

[1]
Nicolas Ballas, Li Yao, Chris Pal, and Aaron Courville. 2015. Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432 (2015).
[2]
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. 2019. Tracking without bells and whistles. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 941--951.
[3]
Keni Bernardin and Rainer Stiefelhagen. 2008. Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, Vol. 2008 (2008), 1--10.
[4]
Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP). IEEE, 3464--3468.
[5]
Guillem Brasó and Laura Leal-Taixé. 2020. Learning a neural solver for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6247--6257.
[6]
Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, and Zicheng Liu. 2021. Transmot: Spatial-temporal graph transformer for multiple object tracking. arXiv preprint arXiv:2104.00194 (2021).
[7]
Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, and Jian Sun. 2020. Detection in crowded scenes: One proposal, multiple predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12214--12223.
[8]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision. 764--773.
[9]
Peng Dai, Renliang Weng, Wongun Choi, Changshui Zhang, Zhangping He, and Wei Ding. 2021. Learning a proposal classifier for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2443--2452.
[10]
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixé. 2020. Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020).
[11]
Song Guo, Jingya Wang, Xinchao Wang, and Dacheng Tao. 2021. Online multiple object tracking with cross-task synergy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8136--8145.
[12]
Jiawei He, Zehao Huang, Naiyan Wang, and Zhaoxiang Zhang. 2021. Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5299--5309.
[13]
Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. (1960).
[14]
Chanho Kim, Li Fuxin, Mazen Alotaibi, and James M Rehg. 2021. Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9553--9562.
[15]
Chanho Kim, Fuxin Li, Arridhana Ciptadi, and James M Rehg. 2015. Multiple hypothesis tracking revisited. In Proceedings of the IEEE international conference on computer vision. 4696--4704.
[16]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[17]
Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, Vol. 2, 1--2 (1955), 83--97.
[18]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV). 734--750.
[19]
Chao Liang, Zhipeng Zhang, Yi Lu, Xue Zhou, Bing Li, Xiyong Ye, and Jianxiao Zou. 2020. Rethinking the competition between detection and reid in multi-object tracking. arXiv preprint arXiv:2010.12138 (2020).
[20]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017a. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.
[21]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[22]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[23]
Wenhan Luo, Björn Stenger, Xiaowei Zhao, and Tae-Kyun Kim. 2018. Trajectories as topics: Multi-object tracking by topic discovery. IEEE Transactions on Image Processing, Vol. 28, 1 (2018), 240--252.
[24]
Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, and Tae-Kyun Kim. 2021. Multiple object tracking: A literature review. Artificial Intelligence, Vol. 293 (2021), 103448.
[25]
Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, and Christoph Feichtenhofer. 2021. Trackformer: Multi-object tracking with transformers. arXiv preprint arXiv:2101.02702 (2021).
[26]
Anton Milan, Laura Leal-Taixé, Ian Reid, Stefan Roth, and Konrad Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016).
[27]
Jinlong Peng, Changan Wang, Fangbin Wan, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yanwei Fu. 2020. Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In European conference on computer vision. Springer, 145--161.
[28]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.
[29]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).
[30]
Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In European conference on computer vision. Springer, 17--35.
[31]
Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann, and Stephen Gould. 2021. Probabilistic tracklet scoring and inpainting for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14329--14339.
[32]
Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. 2018. CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv preprint arXiv:1805.00123 (2018).
[33]
Daniel Stadler and Jurgen Beyerer. 2021. Improving multiple pedestrian tracking by track management and occlusion handling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10958--10967.
[34]
Peize Sun, Jinkun Cao, Yi Jiang, Rufeng Zhang, Enze Xie, Zehuan Yuan, Changhu Wang, and Ping Luo. 2020b. Transtrack: Multiple object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020).
[35]
ShiJie Sun, Naveed Akhtar, XiangYu Song, HuanSheng Song, Ajmal Mian, and Mubarak Shah. 2020a. Simultaneous detection and tracking with motion modelling for multiple object tracking. In European Conference on Computer Vision. Springer, 626--643.
[36]
Ramana Sundararaman, Cedric De Almeida Braga, Eric Marchand, and Julien Pettre. 2021. Tracking pedestrian heads in dense crowd. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3865--3875.
[37]
Pavel Tokmakov, Jie Li, Wolfram Burgard, and Adrien Gaidon. 2021. Learning to track with object permanence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10860--10869.
[38]
Qiang Wang, Yun Zheng, Pan Pan, and Yinghui Xu. 2021. Multiple object tracking with correlation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3876--3886.
[39]
Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang. 2020. Towards real-time multi-object tracking. In European Conference on Computer Vision. Springer, 107--122.
[40]
Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple online and realtime tracking with a deep association metric. In 2017 IEEE international conference on image processing (ICIP). IEEE, 3645--3649.
[41]
Jiarui Xu, Yue Cao, Zheng Zhang, and Han Hu. 2019. Spatial-temporal relation networks for multi-object tracking. In Proceedings of the IEEE/CVF international conference on computer vision. 3988--3998.
[42]
Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, and Xavier Alameda-Pineda. 2021. Transcenter: Transformers with dense queries for multiple-object tracking. arXiv preprint arXiv:2103.15145 (2021).
[43]
Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. 2021. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11784--11793.
[44]
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2403--2412.
[45]
Yang Zhang, Hao Sheng, Yubin Wu, Shuai Wang, Wei Ke, and Zhang Xiong. 2020a. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, Vol. 7, 9 (2020), 7892--7902.
[46]
Yang Zhang, Hao Sheng, Yubin Wu, Shuai Wang, Wei Ke, and Zhang Xiong. 2020b. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal, Vol. 7, 9 (2020), 7892--7902.
[47]
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2021a. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. arXiv preprint arXiv:2110.06864 (2021).
[48]
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2021b. Fairmot: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision, Vol. 129, 11 (2021), 3069--3087.
[49]
Xingyi Zhou, Vladlen Koltun, and Philipp Krahenbühl. 2020a. Tracking objects as points. In European Conference on Computer Vision. Springer, 474--490.
[50]
Zongwei Zhou, Wenhan Luo, Qiang Wang, Junliang Xing, and Weiming Hu. 2020b. Distractor-aware discrimination learning for online multiple object tracking. Pattern Recognition, Vol. 107 (2020), 107512.

Cited By

View all
  • (2024)Predictive and Near-Optimal Sampling for View Materialization in Video DatabasesProceedings of the ACM on Management of Data10.1145/36392742:1(1-27)Online publication date: 26-Mar-2024
  • (2024)FocoTrack: Multi Object Tracking by Focusing On Overlap at Low Frame Rate2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610679(16222-16228)Online publication date: 13-May-2024
  • (2024)APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object TrackingInternational Journal of Computer Vision10.1007/s11263-024-02237-xOnline publication date: 3-Nov-2024
  • Show More Cited By

Index Terms

  1. APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. low-frame-rate videos
    2. multi-object tracking
    3. occlusion handling

    Qualifiers

    • Research-article

    Funding Sources

    • the 5G Open Laboratory of Hangzhou Future Sci-Tech City
    • the Fundamental Research Funds for the Central Universities.
    • NSFC

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Predictive and Near-Optimal Sampling for View Materialization in Video DatabasesProceedings of the ACM on Management of Data10.1145/36392742:1(1-27)Online publication date: 26-Mar-2024
    • (2024)FocoTrack: Multi Object Tracking by Focusing On Overlap at Low Frame Rate2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610679(16222-16228)Online publication date: 13-May-2024
    • (2024)APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object TrackingInternational Journal of Computer Vision10.1007/s11263-024-02237-xOnline publication date: 3-Nov-2024
    • (2023)Multiobject Tracking via Discriminative Embeddings for the Internet of ThingsIEEE Internet of Things Journal10.1109/JIOT.2023.324273910:12(10532-10546)Online publication date: 15-Jun-2023
    • (2023)InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS55552.2023.10341690(9079-9085)Online publication date: 1-Oct-2023
    • (2023)Collaborative Tracking Learning for Frame-Rate-Insensitive Multi-Object Tracking2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00914(9930-9939)Online publication date: 1-Oct-2023
    • (2023)F&F Attack: Adversarial Attack against Multiple Object Trackers by Inducing False Negatives and False Positives2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00422(4550-4560)Online publication date: 1-Oct-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media