research-article

DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy

Authors:

Shengfeng HeAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 2050 - 2058

https://doi.org/10.1145/3664647.3680617

Published: 28 October 2024 Publication History

Abstract

Drone-based crowd tracking faces difficulties in accurately identifying and monitoring objects from an aerial perspective, largely due to their small size and close proximity to each other, which complicates both localization and tracking. To address these challenges, we present the Density-aware Tracking (DenseTrack) framework. DenseTrack capitalizes on crowd counting to precisely determine object locations, blending visual and motion cues to improve the tracking of small-scale objects. It specifically addresses the problem of cross-frame motion to enhance tracking accuracy and dependability. DenseTrack employs crowd density estimates as anchors for exact object localization within video frames. These estimates are merged with motion and position information from the tracking network, with motion offsets serving as key tracking cues. Moreover, DenseTrack enhances the ability to distinguish small-scale objects using insights from the visual-language model, integrating appearance with motion cues. The framework utilizes the Hungarian algorithm to ensure the accurate matching of individuals across frames. Demonstrated on DroneCrowd dataset, our approach exhibits superior performance, confirming its effectiveness in scenarios captured by drones. Our code will be available at: https://github.com/Zebrabeast/DenseTrack.

Supplemental Material

MP4 File - DenseTrack: Drone-Based Crowd Tracking via Density-aware Motion-appearance Synergy

A brief introduction to DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy

Download
29.76 MB

References

[1]

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. 2022. BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv:2206.14651 (2022).

[2]

Carlos Arteta, Victor S. Lempitsky, J. Alison Noble, and Andrew Zisserman. 2014. Interactive Object Counting. In Proc. Eur. Conf. Comput. Vis. 504--518.

[3]

Takanori Asanomi, Kazuya Nishimura, and Ryoma Bise. 2023. Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking. In Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. 1664--1673.

[4]

Aniket Bera, Nico Galoppo, Dillon Sharlet, Adam T. Lake, and Dinesh Manocha. 2014. AdaPT: Real-Time Adaptive Pedestrian Tracking for Crowded Scenes. In Proc. IEEE Int. Conf. Robot. Autom. 1801--1808.

[5]

Alex Bewley, ZongYuan Ge, Lionel Ott, Fabio Tozeto Ramos, and Ben Upcroft. 2016. Simple Online and Realtime Tracking. In Proc. IEEE Int. Con. Image Process. 3464--3468.

[6]

Jinkun Cao, Jiangmiao Pang, Xinshuo Weng, Rawal Khirodkar, and Kris Kitani. 2023. Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 9686--9696.

[7]

Yutao Cui, Chenkai Zeng, Xiaoyu Zhao, Yichun Yang, Gangshan Wu, and Limin Wang. 2023. SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes. In Proc. IEEE/CVF Int. Conf. Comput. Vis. 9887--9897.

[8]

Yunhao Du, Zhicheng Zhao, Yang Song, Yanyun Zhao, Fei Su, Tao Gong, and Hongying Meng. 2023. StrongSORT: Make DeepSORT Great Again. IEEE Trans. Multimedia, Vol. 25 (2023), 8725--8737.

Digital Library

[9]

Teng Fu, Xiaocong Wang, Haiyang Yu, Ke Niu, Bin Li, and Xiangyang Xue. 2023. DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions. In Proc. ACM Multimedia. 2734--2743.

Digital Library

[10]

Tao Han, Junyu Gao, Yuan Yuan, and Qi Wang. 2020. Focus on Semantic Consistency for Cross-Domain Crowd Understanding. In Proc. IEEE Int. Conf. Acoustics Speech Signal Process. 1848--1852.

[11]

Junya Hayashida, Kazuya Nishimura, and Ryoma Bise. 2020. MPM: Joint Representation of Motion and Position Map for Cell Tracking. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 3822--3831.

[12]

Weiming Hu, Xue Zhou, Wei Li, Wenhan Luo, Xiaoqin Zhang, and Stephen J. Maybank. 2013. Active Contour-Based Visual Tracking by Integrating Colors, Shapes, and Motions. IEEE Trans. Image Process., Vol. 22, 5 (2013), 1778--1792.

Digital Library

[13]

Kan Huang, Chunwei Tian, Jingyong Su, and Jerry Chun-Wei Lin. 2022. Transformer-based Cross Reference Network for video salient object detection. Pattern Recognit. Lett., Vol. 160 (2022), 122--127.

Digital Library

[14]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proc. Int. Conf. Learn. Represent. 1--11.

[15]

Louis Kratz and Ko Nishino. 2010. Tracking with Local Spatio-Temporal Motion Patterns in Extremely Crowded Scenes. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 693--700.

[16]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In Proc. Int. Conf. Mach. Learn. 19730--19742.

[17]

Junnan Li, Dongxu Li, Caiming Xiong, and Steven C. H. Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Proc. Int. Conf. Mach. Learn. 12888--12900.

[18]

Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 1091--1100.

[19]

Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, and Xiang Bai. 2022. TransCrowd: Weakly-Supervised Crowd Counting with Transformers. Sci. China Inf. Sci., Vol. 65, 6 (2022), 1--14.

[20]

Dingkang Liang, Wei Xu, Yingying Zhu, and Yu Zhou. 2023. Focal Inverse Distance Transform Maps for Crowd Localization. IEEE Trans. Multimedia, Vol. 25 (2023), 6040--6052.

Digital Library

[21]

Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2019. Context-Aware Crowd Counting. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 5099--5108.

[22]

Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, and Tae-Kyun Kim. 2021. Multiple Object Tracking: A Literature Review. Artif. Intell., Vol. 293 (2021), 103448.

[23]

Gerard Maggiolino, Adnan Ahmad, Jinkun Cao, and Kris Kitani. 2023. Deep OC-Sort: Multi-Pedestrian Tracking by Adaptive Re-identification. In Proc. IEEE Int. Conf. Image Process. 3025--3029.

[24]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models from Natural Language Supervision. In Proc. Int. Conf. Mach. Learn. 8748--8763.

[25]

Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang, and Antoni B. Chan. 2021. Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets. IEEE Trans. Image Process., Vol. 30 (2021), 1439--1452.

Digital Library

[26]

Usman Sajid, Xiangyu Chen, Hasan Sajid, Taejoon Kim, and Guanghui Wang. 2021. Audio-Visual Transformer Based Crowd Counting. In Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops. 2249--2259.

[27]

Jenny Seidenschwarz, Guillem Brasó, Victor Castro Serrano, Ismail Elezi, and Laura Leal-Taixé. 2023. Simple Cues Lead to a Strong Multi-Object Tracker. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 13813--13823.

[28]

Bing Shuai, Andrew G. Berneshawi, Xinyu Li, Davide Modolo, and Joseph Tighe. 2021. SiamMOT: Siamese Multi-Object Tracking. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 12372--12382.

[29]

ShiJie Sun, Naveed Akhtar, HuanSheng Song, ChaoYang Zhang, Jianxin Li, and Ajmal Mian. 2019. Benchmark Data and Method for Real-Time People Counting in Cluttered Scenes Using Depth Sensors. IEEE Trans. Intell. Transp. Syst., Vol. 20, 10 (2019), 3599--3612.

[30]

Zhihong Sun, Jun Chen, Chao Liang, Weijian Ruan, and Mithun Mukherjee. 2021. A Survey of Multiple Pedestrian Tracking Based on Tracking-by-Detection Framework. IEEE Trans. Circuits Syst. Video Technol. (2021), 1819--1833.

Digital Library

[31]

Zhihong Sun, Jun Chen, Mithun Mukherjee, Chao Liang, Weijian Ruan, and Zhigeng Pan. 2022. Online multiple object tracking based on fusing global and partial features. Neurocomputing (2022), 190--203.

[32]

Ramana Sundararaman, Cedric De Almeida Braga, Éric Marchand, and Julien Pettré. 2021. Tracking Pedestrian Heads in Dense Crowd. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 3865--3875.

[33]

Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, and Bastian Leibe. 2019. MOTS: Multi-Object Tracking and Segmentation. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 7942--7951.

[34]

Boyu Wang, Huidong Liu, Dimitris Samaras, and Minh Hoai Nguyen. 2020. Distribution Matching for Crowd Counting. In Adv. Neural Inf. Process. Syst. 1--13.

[35]

Jue Wang and Lorenzo Torresani. 2022. Deformable Video Transformer. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 14033--14042.

[36]

Mingjie Wang, Hao Cai, Xian-Feng Han, Jun Zhou, and Minglun Gong. 2023. STNet: Scale Tree Network with Multi-Level Auxiliator for Crowd Counting. IEEE Trans. Multimedia, Vol. 25 (2023), 2074--2084.

Digital Library

[37]

Longyin Wen, Dawei Du, Pengfei Zhu, Qinghua Hu, Qilong Wang, Liefeng Bo, and Siwei Lyu. 2021. Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 7812--7821.

[38]

Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2017. Simple Online and Realtime Tracking with a Deep Association Metric. In Proc. IEEE Int. Conf. Image Process. 3645--3649.

Digital Library

[39]

Zhenbo Xu, Wei Yang, Wei Zhang, Xiao Tan, Huan Huang, and Liusheng Huang. 2022. Segment as Points for Efficient and Effective Online Multi-Object Tracking and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 10, 6424--6437.

Digital Library

[40]

Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, and Wangmeng Zuo. 2022. Crowd Counting Via Perspective-Guided Fractional-Dilation Convolution. IEEE Trans. Multimedia, Vol. 24 (2022), 2633--2647.

[41]

Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, and Shin'ichi Satoh. 2019. Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing. In Proc. AAAI Conf. Artif. Intell. 9087--9094.

Digital Library

[42]

Tianzhu Zhang, Si Liu, Changsheng Xu, Bin Liu, and Ming-Hsuan Yang. 2018. Correlation Particle Filter for Visual Tracking. IEEE Trans. Image Process., Vol. 27, 6 (2018), 2676--2687.

Digital Library

[43]

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2022. ByteTrack: Multi-object Tracking by Associating Every Detection Box. In Proc. Eur. Conf. Comput. Vis. 1--21.

Digital Library

[44]

Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 589--597.

[45]

Xian Zhong, Ming Tan, Weijian Ruan, Wenxin Huang, Liang Xie, and Jingling Yuan. 2020. Dual-Direction Perception and Collaboration Network for Near-Online Multi-Object Tracking. In Proc. IEEE Int. Conf. Image Process. 2111--2115.

[46]

Xingyi Zhou, Tianwei Yin, Vladlen Koltun, and Philipp Krähenbühl. 2022. Global Tracking Transformers. In Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. 8761--8770.

[47]

Huilin Zhu, Jingling Yuan, Xian Zhong, Liang Liao, and Zheng Wang. 2024. Find Gold in Sand: Fine-Grained Similarity Mining for Domain-Adaptive Crowd Counting. IEEE Trans. Multimedia, Vol. 26 (2024), 3842--3855.

Digital Library

[48]

Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, and Shengfeng He. 2023. DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. In Proc. ACM Multimedia. 4319--4329.

Digital Library

Index Terms

DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
2. Human-centered computing
  1. Collaborative and social computing

Recommendations

PAS Tracker: Position-, Appearance- and Size-Aware Multi-object Tracking in Drone Videos
Computer Vision – ECCV 2020 Workshops
Abstract
While most multi-object tracking methods based on tracking-by-detection use either spatial or appearance cues for associating detections or apply one cue after another, our proposed PAS tracker employs a novel similarity measure that combines ...
Scale-Adaptive Spatial Appearance Feature Density Approximation for Object Tracking

Object tracking is an essential task in visual traffic surveillance. Ideally, a tracker should be able to accurately capture an object's natural motion such as translation, rotation, and scaling. However, it is well known that object appearance varies ...
Simultaneous Crowd Counting and Localization by WiFi CSI
ICDCN '21: Proceedings of the 22nd International Conference on Distributed Computing and Networking

Crowd estimation is considered as an attractive technique for indoor energy saving, route guidance, etc. Generally, crowd estimation consists of crowd density estimation and crowd counting. In vision-based approach of crowd estimation, there are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Singapore under the AI Singapore Programme
Sanya Yazhou Bay Science and Technology City Administration scientific research project
Guangdong Natural Science Funds for Distinguished Young Scholar
National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
60
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)8

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten