skip to main content
10.1145/3357384.3358029acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Multi-Target Multi-Camera Tracking with Human Body Part Semantic Features

Published: 03 November 2019 Publication History

Abstract

Recently, Multi-Target Multi-Camera Tracking (MTMCT) has gained more and more attention. It is a challenging task with major problems including occlusion, background clutter, poses and camera point of view variations. Compared to single camera tracking, which takes advantage of location information and strict time constraints, good appearance features are more important to MTMCT. This drives us to extract robust and discriminative features for MTMCT. We propose MTMCT\_HS which uses human body part semantic features to overcome the above challenges. We use a two-stream deep neural network to extract the global appearance features and human body part semantic maps separately, and employ aggregation operations to generate final features. We argue that these features are more suitable for affinity measurement, which can be seen as the average of appearance similarity weighted by the corresponding human body part similarity. Next, our tracker adopts a hierarchical correlation clustering algorithm, which combines targets' appearance feature similarity with motion correlation for data association. We validate the effectiveness of our MTMCT\_HS method by demonstrating its superiority over the state-of-the-art method on DukeMTMC benchmark. Experiments show that the extracted features with human body part semantics are more effective for MTMCT compared with the methods solely employing global appearance features.

References

[1]
Mustafa Ayazoglu, Binlong Li, Caglayan Dicle, Mario Sznaier, and Octavia I Camps. 2011. Dynamic subspace-based coordinated multicamera tracking. In 2011 International Conference on Computer Vision. IEEE, 2462--2469.
[2]
Shai Bagon and Meirav Galun. 2011. Large scale correlation clustering optimization. arXiv preprint arXiv:1112.2903 (2011).
[3]
Igor Barros Barbosa, Marco Cristani, Barbara Caputo, Aleksander Rognhaugen, and Theoharis Theoharis. 2018. Looking beyond appearances: Synthetic training data for deep cnns in re-identification. Computer Vision and Image Understanding 167 (2018), 50--62.
[4]
Asad A Butt and Robert T Collins. 2013. Multi-target tracking by lagrangian relaxation to min-cost network flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1846--1853.
[5]
Yinghao Cai and Gerard Medioni. 2014. Exploring context information for intercamera multiple target tracking. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 761--768.
[6]
Yinghao Cai and Gerard Medioni. 2014. Exploring context information for intercamera multiple target tracking. In IEEE Winter Conference on Applications of Computer Vision. IEEE, 761--768.
[7]
Lijun Cao, Weihua Chen, Xiaotang Chen, Shuai Zheng, and Kaiqi Huang. 2015. An equalised global graphical model-based approach for multi-camera object tracking. arXiv preprint arXiv:1502.03532 (2015).
[8]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multiperson 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7291--7299.
[9]
Visesh Chari, Simon Lacoste-Julien, Ivan Laptev, and Josef Sivic. 2015. On pairwise costs for network flow multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5537--5545.
[10]
Kuan-Wen Chen, Chih-Chuan Lai, Pei-Jyun Lee, Chu-Song Chen, and Yi-Ping Hung. 2011. Adaptive learning for target tracking and true linking discovering across multiple non-overlapping cameras. IEEE Transactions on Multimedia 13, 4 (2011), 625--638.
[11]
Long Chen, Haizhou Ai, Zijie Zhuang, and Chong Shang. 2018. Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification. In IEEE International Conference on Multimedia and Expo (ICME) 2018. 1--6.
[12]
Xiaojing Chen, Le An, and Bir Bhanu. 2015. Multitarget tracking in nonoverlapping cameras using a reference set. IEEE Sensors Journal 15, 5 (2015), 2692--2704.
[13]
Robert T Collins. 2012. Multitarget data association with higher-order motion models. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 1744--1751.
[14]
Abir Das, Anirban Chakraborty, and Amit K Roy-Chowdhury. 2014. Consistent re-identification in a camera network. In European conference on computer vision. Springer, 330--345.
[15]
Kuan Fang, Yu Xiang, Xiaocheng Li, and Silvio Savarese. 2018. Recurrent autoregressive networks for online multi-object tracking. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 466--475.
[16]
Pedro F Felzenszwalb, David A McAllester, Deva Ramanan, et al. 2008. A discriminatively trained, multiscale, deformable part model. In Cvpr, Vol. 2. 7.
[17]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).
[20]
Mahdi M Kalayeh, Emrah Basaran, Muhittin Gökmen, Mustafa E Kamasak, and Mubarak Shah. 2018. Human semantic parsing for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1062--1071.
[21]
Ahmed T Kamal, Jay A Farrell, and Amit K Roy-Chowdhury. 2013. Information consensus for distributed multi-target tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2403--2410.
[22]
Hilke Kieritz, Wolfgang Hubner, and Michael Arens. 2018. Joint detection and online multi-object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1459--1467.
[23]
Chanho Kim, Fuxin Li, and James M Rehg. 2018. Multi-object tracking with neural gating using bilinear lstm. In Proceedings of the European Conference on Computer Vision (ECCV). 200--215.
[24]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117-- 2125.
[25]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[26]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.
[27]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[28]
Andrii Maksai, Xinchao Wang, Francois Fleuret, and Pascal Fua. 2017. Nonmarkovian globally consistent multi-object tracking. In Proceedings of the IEEE International Conference on Computer Vision. 2544--2554.
[29]
Niki Martinel, Christian Micheloni, and Gian Luca Foresti. 2014. Saliency weighted features for person re-identification. In European Conference on Computer Vision. Springer, 191--208.
[30]
Anton Milan, S Hamid Rezatofighi, Anthony Dick, Ian Reid, and Konrad Schindler. 2017. Online multi-target tracking using recurrent neural networks. In Thirty- First AAAI Conference on Artificial Intelligence.
[31]
Manfred Padberg and Giovanni Rinaldi. 1991. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM review 33, 1 (1991), 60--100.
[32]
Zhen Qin and Christian R Shelton. 2012. Improving multi-target tracking via social grouping. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1972--1978.
[33]
Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
[34]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[35]
Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision. Springer, 17--35.
[36]
Ergys Ristani and Carlo Tomasi. 2018. Features for multi-target multi-camera tracking and re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6036--6046.
[37]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115, 3 (2015), 211--252.
[38]
Guang Shu, Afshin Dehghan, Omar Oreifej, Emily Hand, andMubarak Shah. 2012. Part-based multiple-person tracking with partial occlusion handling. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1815--1821.
[39]
Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, and Kyoung Mu Lee. 2018. Part-aligned bilinear representations for person re-identification. In Proceedings of the European Conference on Computer Vision (ECCV). 402--419.
[40]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.
[41]
Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. Multiperson tracking by multicut and deep matching. In European Conference on Computer Vision. Springer, 100--111.
[42]
Siyu Tang, Mykhaylo Andriluka, Bjoern Andres, and Bernt Schiele. 2017. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3539--3548.
[43]
Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2019. Multi-target Tracking in Multiple Non-overlapping Cameras Using Fast-Constrained Dominant Sets. International Journal of Computer Vision (2019), 1--18.
[44]
JiuqingWan and Liu Li. 2013. Distributed optimization for global data association in non-overlapping camera networks. In 2013 Seventh International Conference on Distributed Smart Cameras (ICDSC). IEEE, 1--7.
[45]
Bo Yang and Ram Nevatia. 2012. An online learned CRF model for multi-target tracking. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2034--2041.
[46]
Shu Zhang, Yingying Zhu, and Amit Roy-Chowdhury. 2015. Tracking multiple interacting targets in a camera network. Computer Vision and Image Understanding 134 (2015), 64--73.
[47]
Liming Zhao, Xi Li, Yueting Zhuang, and Jingdong Wang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3219--3228.
[48]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision. 3754--3762.
[49]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2018. Pedestrian alignment network for large-scale person re-identification. IEEE Transactions on Circuits and Systems for Video Technology (2018).

Index Terms

  1. Multi-Target Multi-Camera Tracking with Human Body Part Semantic Features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
    November 2019
    3373 pages
    ISBN:9781450369763
    DOI:10.1145/3357384
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature fusion
    2. hierarchical correlation clustering
    3. multi-target multi-camera tracking

    Qualifiers

    • Research-article

    Conference

    CIKM '19
    Sponsor:

    Acceptance Rates

    CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 280
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media