Abstract
We propose a joint feature and metric learning deep neural network architecture, called the associative affinity network (AAN), as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections, the AAN jointly learns bounding box regression, classification, and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss, we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT, and can also perform single-object tracking. Based on the AAN, we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.
摘要
为解决视频多目标跟踪问题, 提出一种特征和度量联合学习的深度神经网络架构, 称为关联相似度网络. 关联相似度网络以端到端的方式学习跟踪轨迹和检测结果之间的关联相似度. 针对有缺陷的检测结果, 关联相似度网络同时学习矩形框回归、目标分类和相似度回归3个任务. 不同于现有基于对比排序思想的方法, 我们直接训练一个二分类器来学习跟踪轨迹与检测结果的关联相似度, 同时设计了损失函数来约束匹配集合元素的个数. 得益于上述设计, 关联相似度网络不仅能够解决多目标跟踪问题中的匹配问题, 还可以进行单目标跟踪. 基于提出的关联相似度网络, 设计了一个简单的多目标跟踪算法, 在MOT16和MOT17测试集上的实验结果表明其有效性.
Similar content being viewed by others
References
Andriyenko A, Roth S, Schindler K, 2011. An analytical formulation of global occlusion reasoning for multi-target tracking. IEEE Int Conf on Computer Vision Workshops, p.1839–1846. https://doi.org/10.1109/ICCVW.2011.6130472
Bergmann P, Meinhardt T, Leal-Taixé L, 2019a. Tracking without bells and whistles. IEEE/CVF Int Conf on Computer Vision, p.941–951. https://doi.org/10.1109/ICCV.2019.00103
Bergmann P, Meinhardt T, Leal-Taixé L, 2019b. Tracktor++ v2. Available from https://github.com/philbergmann/tracking_wo_bnw [Accessed on July 9, 2020].
Bullinger S, Bodensteiner C, Arens M, 2017. Instance flow based online multiple object tracking. IEEE Int Conf on Image Processing, p.785–789. https://doi.org/10.1109/ICIP.2017.8296388
Chen L, Ai HZ, Zhuang ZJ, et al., 2018. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486597
Chen S, Gong C, Yang J, et al., 2018. Adversarial metric learning. Proc 27th Int Joint Conf on Artificial Intelligence, p.2021–2027. https://doi.org/10.24963/IJCAI.2018/279
Chen S, Luo L, Yang J, et al., 2019. Curvilinear distance metric learning. Proc 33rd Int Conf on Neural Information Processing Systems, p.4223–4232.
Choi W, 2015. Near-online multi-target tracking with aggregated local flow descriptor. IEEE Int Conf on Computer Vision, p.3029–3037. https://doi.org/10.1109/ICCV.2015.347
Chu P, Ling HB, 2019. FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. IEEE/CVF Int Conf on Computer Vision, p.6171–6180. https://doi.org/10.1109/ICCV.2019.00627
Chu Q, Ouyang WL, Li HS, et al., 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. Proc IEEE Int Conf on Computer Vision, p.4846–4855. https://doi.org/10.1109/ICCV.2017.518
Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886–893. https://doi.org/10.1109/CVPR.2005.177
Duan YQ, Lu JW, Zheng WH, et al., 2020. Deep adversarial metric learning. IEEE Trans Image Process, 29:2037–2051. https://doi.org/10.1109/TIP.2019.2948472
Emami P, Ranka S, 2018. Learning permutations with sinkhorn policy gradient. https://arxiv.org/abs/1805.07010
Fagot-Bouquet L, Audigier R, Dhome Y, et al., 2016. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. Proc 14th European Conf on Computer Vision, p.774–790. https://doi.org/10.1007/978-3-319-46484-8_47
Fang K, Xiang Y, Li XC, et al., 2018. Recurrent autoregressive networks for online multi-object tracking. IEEE Winter Conf on Applications of Computer Vision, p.466–475. https://doi.org/10.1109/WACV.2018.00057
Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. IEEE Int Conf on Computer Vision, p.3057–3065. https://doi.org/10.1109/ICCV.2017.330
Felzenszwalb PF, Girshick RB, McAllester D, et al., 2010. Object detection with discriminatively trained part-based models. IEEE Trans Patt Anal Mach Intell, 32(9):1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Han XF, Leung T, Jia YG, et al., 2015. MatchNet: unifying feature and metric learning for patch-based matching. IEEE Conf on Computer Vision and Pattern Recognition, p.3279–3286. https://doi.org/10.1109/CVPR.2015.7298948
He KM, Gkioxari G, Dollãr P, et al., 2017. Mask R-CNN. IEEE Int Conf on Computer Vision, p.2980–2988. https://doi.org/10.1109/ICCV.2017.322
Henschel R, Leal-Taixé L, Cremers D, et al., 2018. Fusion of head and full-body detectors for multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1509–1518. https://doi.org/10.1109/CVPRW.2018.00192
Hermans A, Beyer L, Leibe B, 2017. In defense of the triplet loss for person re-identification. https://arxiv.org/abs/1703.07737
Ilg E, Mayer N, Saikia T, et al., 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1647–1655. https://doi.org/10.1109/CVPR.2017.179
Keuper M, Tang SY, Yu ZJ, et al., 2016. A multi-cut formulation for joint segmentation and tracking of multiple objects. https://arxiv.org/abs/1607.06317
Kim C, Li FX, Ciptadi A, et al., 2015. Multiple hypothesis tracking revisited. IEEE Int Conf on Computer Vision, p.4696–4704. https://doi.org/10.1109/ICCV.2015.533
Lan L, Tao DC, Gong C, et al., 2016. Online multi-object tracking by quadratic pseudo-Boolean optimization. Proc 25th Int Joint Conf on Artificial Intelligence, p.3396–3402.
Leal-Taixé L, Canton-Ferrer C, Schindler K, 2016. Learning by tracking: Siamese CNN for robust target association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.418–425. https://doi.org/10.1109/CVPRW.2016.59
Ma C, Yang CS, Yang F, et al., 2018. Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking. IEEE Int Conf on Multimedia and Expo, p.1–6. https://doi.org/10.1109/ICME.2018.8486454
Maksai A, Wang XC, Fleuret F, et al., 2017. Non-Markovian globally consistent multi-object tracking. IEEE Int Conf on Computer Vision, p.2563–2573. https://doi.org/10.1109/ICCV.2017.278
Milan A, Rezatofighi SH, Garg R, et al., 2017a. Data-driven approximations to NP-hard problems. Proc 31st AAAI Conf on Artificial Intelligence, p.1453–1459.
Milan A, Rezatofighi SH, Dick A, et al., 2017b. Online multi-target tracking using recurrent neural networks. Proc 31st AAAI Conf on Artificial Intelligence, p.4225–4232.
Nummiaro K, Koller-Meier E, van Gool L, 2003. An adaptive color-based particle filter. Image Vis Comput, 21(1):99–110. https://doi.org/10.1016/S0262-8856(02)00129-4
Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Rezatofighi SH, Milan A, Zhang Z, et al., 2015. Joint probabilistic data association revisited. IEEE Int Conf on Computer Vision, p.3047–3055. https://doi.org/10.1109/ICCV.2015.349
Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6036–6046. https://doi.org/10.1109/CVPR.2018.00632
Ristani E, Solera F, Zou R, et al., 2016. Performance measures and a data set for multi-target, multi-camera tracking. European Conf on Computer Vision, p.17–35. https://doi.org/10.1007/978-3-319-48881-3_2
Sadeghian A, Alahi A, Savarese S, 2017. Tracking the untrackable: learning to track multiple cues with long-term dependencies. IEEE Int Conf on Computer Vision, p.300–311. https://doi.org/10.1109/ICCV.2017.41
Schulter S, Vernaza P, Choi W, et al., 2017. Deep network flow for multi-object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2730–2739. https://doi.org/10.1109/CVPR.2017.292
Shen H, Huang LC, Huang C, et al., 2018. Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. https://arxiv.org/abs/1808.01562
Shrivastava A, Gupta A, Girshick R, 2016. Training region-based object detectors with online hard example mining. IEEE Conf on Computer Vision and Pattern Recognition, p.761–769. https://doi.org/10.1109/CVPR.2016.89
Son J, Baek M, Cho M, et al., 2017. Multi-object tracking with quadruplet convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.3786–3795. https://doi.org/10.1109/CVPR.2017.403
Sun SJ, Akhtar N, Song HS, et al., 2021. Deep affinity network for multiple object tracking. IEEE Trans Patt Anal Mach Intell, 43(1):104–119. https://doi.org/10.1109/TPAMI.2019.2929520
Tang SY, Andriluka M, Andres B, et al., 2017. Multiple people tracking by lifted multicut and person reidentification. IEEE Conf on Computer Vision and Pattern Recognition, p.3701–3710. https://doi.org/10.1109/CVPR.2017.394
Wang B, Wang L, Shuai B, et al., 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.386–393. https://doi.org/10.1109/CVPRW.2016.55
Wang XY, Han TX, Yan S, 2009. An HOG-LBP human detector with partial occlusion handling. Proc IEEE 12th Int Conf on Computer Vision, p.32–39. https://doi.org/10.1109/ICCV.2009.5459207
Wojke N, Bewley A, Paulus D, 2017. Simple online and realtime tracking with a deep association metric. IEEE Int Conf on Image Processing, p.3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Xiang J, Sang N, Hou JH, et al., 2016. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Process Lett, 23(2):257–261. https://doi.org/10.1109/LSP.2015.2512878
Xiang J, Xu GH, Ma C, et al., 2021. End-to-end learning deep CRF models for multi-object tracking. IEEE Trans Circ Syst Video Technol, 31(1):275–288. https://doi.org/10.1109/TCSVT.2020.2975842
Xiang Y, Alahi A, Savarese S, 2015. Learning to track: online multi-object tracking by decision making. IEEE Int Conf on Computer Vision, p.4705–4713. https://doi.org/10.1109/ICCV.2015.534
Yang B, Nevatia R, 2014. Multi-target tracking by online learning a CRF model of appearance and motion patterns. Int J Comput Vis, 107(2):203–217. https://doi.org/10.1007/S11263-013-0666-4
Yang F, Choi W, Lin YQ, 2016. Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. IEEE Conf on Computer Vision and Pattern Recognition, p.2129–2137. https://doi.org/10.1109/CVPR.2016.234
Yin JB, Wang WG, Meng QH, et al., 2020. A unified object motion and affinity model for online multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6767–6776. https://doi.org/10.1109/CVPR42600.2020.00680
Zhang JMY, Zhou SP, Chang X, et al., 2020. Multiple object tracking by flowing and fusing. https://arxiv.org/abs/2001.11180
Zhou XY, Koltun V, Krähenbühl P, 2020. Tracking objects as points. https://arxiv.org/abs/2004.01177
Author information
Authors and Affiliations
Contributions
Liang MA and Qiaoyong ZHONG contributed to methodology, validation, and writing. Yingying ZHANG contributed to experiment design. Di XIE and Shiliang PU contributed to supervision and project administration.
Corresponding author
Ethics declarations
Liang MA, Qiaoyong ZHONG, Yingying ZHANG, Di XIE, and Shiliang PU declare that they have no conflict of interest.
Additional information
Project supported by the National Key Research and Development Program of China (No. 2020AAA0109004) and the Zhejiang Postdoc Sponsorship
Rights and permissions
About this article
Cite this article
Ma, L., Zhong, Q., Zhang, Y. et al. Associative affinity network learning for multi-object tracking. Front Inform Technol Electron Eng 22, 1194–1206 (2021). https://doi.org/10.1631/FITEE.2000272
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2000272