Skip to main content
Log in

SSL-MOT: self-supervised learning based multi-object tracking

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Although the use of a Siamese network is the most popular approach in object tracking, it creates an undesirable trivial solution and requires a large amount of training data reflecting changes in the object’s shape in every frame. To solve this problem, in this paper, a self-supervised learning method for multi-object tracking (SSL-MOT) based on a contrastive structure is proposed. Unlike the existing SSL, we adopt a generative adversarial network as a preprocessing step to generate various pose changes of tracking objects. A positive pair composed of the augmented image and pose data is applied to the SSL network to learn an encoder that can generate a non-collapsed output vector. To improve the discrimination power of the encoder output features, we propose an affinity correlation distance, which combines invariance and redundancy terms as a loss function for learning. During the test, because only the dot product between two output vectors of the tracker and detection was used for a data association, the computation time was significantly reduced, and thus real-time online tracking about 12 fps was possible. The proposed method is the first attempt to apply SSL to an online MOT. Experimental results on the MOT16, 17, and 20 challenge datasets proved that the proposed method is a fast and reasonable tracking method that occupies less memory and achieves an excellent tracking performance compared to other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Shu G, Dehghan A, Oreifej O, Hand E, Shah M (2012) Part-based multiple-person tracking with partial occlusion handling. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1815–1821

  2. Kuhn H W (1955) The hungarian method for the assignment problem. Naval Res Logist Quart 2(1-2):83–97

    Article  MathSciNet  MATH  Google Scholar 

  3. Kim H-U, Koh Y J, Kim C-S (2020) Online multiple object tracking based on open-set few-shot learning. IEEE Access 8:190312–190326

    Article  Google Scholar 

  4. Leal-Taixé L, Canton-Ferrer C, Schindler K (2016) Learning by tracking: Siamese cnn for robust target association. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 33–40

  5. Chu P, Ling H (2019) Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6172–6181

  6. Lee S, Kim E (2018) Multiple object tracking via feature pyramid siamese networks. IEEE Access 7:8181–8194

    Article  Google Scholar 

  7. Lee J, Kim S, Ko B C (2020) Online multiple object tracking using rule distillated siamese random forest. IEEE Access 8:182828–182841

    Article  Google Scholar 

  8. Zhang Z, Zhang Y, Cheng X, Lu G (2021) Siamese network for object tracking with multi-granularity appearance representations. Pattern Recogn 118:108003

    Article  Google Scholar 

  9. Shuai B, Berneshawi A, Li X, Modolo D, Tighe J (2021) Siammot: Siamese multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12372–12382

  10. Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: Graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067

  11. Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6036–6046

  12. Son J, Baek M, Cho M, Han B (2017) Multi-object tracking with quadruplet convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5620–5629

  13. Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 15750–15758

  14. Dai P, Weng R, Choi W, Zhang C, He Z, Ding W (2021) Learning a proposal classifier for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2443–2452

  15. He J, Huang Z, Wang N, Zhang Z (2021) Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5299–5309

  16. Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10958–10967

  17. Grill J-B, Strub F, Altché F, Tallec C, Richemond P H, Buchatskaya E, Doersch C, Pires B A, Guo Z D, Azar M G et al (2020) Bootstrap your own latent: A new approach to self-supervised learning. arXiv:2006.07733

  18. Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning. PMLR, pp 1597–1607

  19. Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: Self-supervised learning via redundancy reduction. arXiv:2103.03230

  20. Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang Y-G, Xue X (2018) Pose-normalized image generation for person re-identification. In: Proceedings of the European conference on computer vision (ECCV), pp 650–667

  21. Lu Y, Lu C, Tang C-K (2017) Online video object detection using association lstm. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2344–2352

  22. Liu H, Zhang H, Mertz C (2019) Deepda: Lstm-based deep data association network for multi-targets tracking in clutter. In: 2019 22th International Conference on Information Fusion (FUSION). IEEE, pp 1–8

  23. Kim C, Fuxin L, Alotaibi M, Rehg J M (2021) Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9553–9562

  24. Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 269–285

  25. Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:1703.07737

  26. Zou H, Cui J, Kong X, Zhang C, Liu Y, Wen F, Li W (2020) F-siamese tracker: A frustum-based double siamese network for 3d single object tracking. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 8133–8139

  27. Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A (2020) Unsupervised learning of visual features by contrasting cluster assignments. arXiv:2006.09882

  28. Bahri D, Jiang H, Tay Y, Metzler D (2021) Scarf: Self-supervised contrastive learning using random feature corruption. arXiv:2106.15147

  29. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  30. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  31. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: A benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116– 1124

  32. Milan A, Leal-Taixé L, Reid I, Roth S, Schindler K (2016) Mot16: A benchmark for multi-object tracking. arXiv:1603.00831

  33. MOT Benchmarks https://motchallenge.net/data/MOT17/

  34. Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid I, Roth S, Schindler K, Leal-Taixé L (2020) Mot20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003

  35. Li J, Gao X, Jiang T (2020) Graph networks for multiple object tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 719–728

  36. Yang J, Ge H, Yang J, Tong Y, Su S (2021) Online multi-object tracking using multi-function integration and tracking simulation training. Appl Intell:1–21

  37. Saleh F, Aliakbarian S, Rezatofighi H, Salzmann M, Gould S (2021) Probabilistic tracklet scoring and inpainting for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14329– 14339

  38. Zhou X, Koltun V, Krähenbühl P (2020) Tracking objects as points. In: European Conference on Computer Vision. Springer, pp 474–490

  39. Si T, He F, Wu H, Duan Y (2022) Spatial-driven features based on image dependencies for person re-identification. Pattern Recogn 124:108462

    Article  Google Scholar 

  40. Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279

    Article  Google Scholar 

  41. Liang Y, He F, Zeng X (2020) 3d mesh simplification with feature preservation based on whale optimization algorithm and differential evolution. Integr Comput-Aided Eng (Preprint):1–19

Download references

Acknowledgements

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), Ministry of Education, under Grant 2019R1I1A3A01042506.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Byoung Chul Ko.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Lee, J. & Ko, B.C. SSL-MOT: self-supervised learning based multi-object tracking. Appl Intell 53, 930–940 (2023). https://doi.org/10.1007/s10489-022-03473-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03473-9

Keywords

Navigation