ABSTRACT
Multiple object tracking (MOT) methods based on single object tracking are of great interest because of their ability to balance efficiency and performance on the strength of the localization capability of single-target tracking. However, most of the single object tracking methods only distinguish foreground and background. They are susceptible to the influence of similar interfering objects during localization, while in multiple object tracking scenarios, there are more interfering objects and the influence is more severe. Therefore, we propose a Distractor-Suppressing Graph Attention (DSGA) to learn more discriminative attention by reducing the influence of distractors on learning attention weight features. Furthermore, DSGA is embedded into the basic MOT framework “SiamMOT” formed as DSGA-SiamMOT and applied to multiple object tracking to verify its effectiveness. We conduct experiments on the MOT Challenge benchmark with "public detection", and obtain MOTA 66.65%, IDF1 62.2% accuracy on the MOT17 dataset with 14fps.
- LEE, M.-K., PYO, J.-W., BAE, S.-H., JOO, S.-H., AND KUC, T.-Y. Traffic light recognition for autonomous driving vehicle: Using mono camera and its. Journal of Image and Graphics 10, 3 (2022), 102–108.Google ScholarCross Ref
- GIRSHICK, R. B. Fast R-CNN. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015 (2015), IEEE Computer Society, pp. 1440–1448.Google Scholar
- REN, S., HE, K., GIRSHICK, R. B., AND SUN, J. Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada (2015), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., pp. 91–99.Google Scholar
- REDMON, J., DIVVALA, S. K., GIRSHICK, R. B., AND FARHADI, A. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 (2016), IEEE Computer Society, pp. 779–788.Google ScholarCross Ref
- TIAN, Z., SHEN, C., CHEN, H., AND HE, T. FCOS: fully convolutional one-stage object detection. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 9626–9635.Google ScholarCross Ref
- WOJKE, N., BEWLEY, A., AND PAULUS, D. Simple online and realtime tracking with a deep association metric. In 2017 IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, September 17-20, 2017 (2017), IEEE, pp. 3645–3649.Google ScholarDigital Library
- YANG, F., CHANG, X., SAKTI, S., WU, Y., AND NAKAMURA, S. Remot: A model-agnostic refinement for multiple object tracking. Image Vis. Comput. 106 (2021), 104091.Google ScholarCross Ref
- SHUAI, B., BERNESHAWI, A. G., LI, X., MODOLO, D., AND TIGHE, J. Siammot: Siamese multi-object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 12372–12382.Google Scholar
- YIN, J., WANG, W., MENG, Q., YANG, R., AND SHEN, J. A unified object motion and affinity model for online multi-object tracking. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 (2020), Computer Vision Foundation / IEEE, pp. 6767–6776.Google ScholarCross Ref
- BERTINETTO, L., VALMADRE, J., HENRIQUES, J. F., VEDALDI, A., AND TORR, P. H. S. Fully-convolutional siamese networks for object tracking. In Computer Vision - ECCV 2016 Workshops - Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II (2016), G. Hua and H. Jégou, Eds., vol. 9914 of Lecture Notes in Computer Science, pp. 850–865.Google ScholarCross Ref
- DANELLJAN, M., BHAT, G., KHAN, F. S., AND FELSBERG, M. ECO: efficient convolution operators for tracking. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017), IEEE Computer Society, pp. 6931–6939.Google ScholarCross Ref
- ZHU, Z., WANG, Q., LI, B., WU, W., YAN, J., AND HU, W. Distractor-aware siamese networks for visual object tracking. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX (2018), V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11213 of Lecture Notes in Computer Science, Springer, pp. 103–119.Google Scholar
- GUO, D., SHAO, Y., CUI, Y., WANG, Z., ZHANG, L., AND SHEN, C. Graph attention tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 9543–9552.Google ScholarCross Ref
- MILAN, A., LEAL-TAIXÉ, L., REID, I. D., ROTH, S., AND SCHINDLER, K. MOT16: A benchmark for multi-object tracking. CoRR abs/1603.00831 (2016).Google Scholar
- DENDORFER, P., REZATOFIGHI, H., MILAN, A., SHI, J., CREMERS, D., REID, I. D., ROTH, S., SCHINDLER, K., AND LEAL-TAIXÉ, L. MOT20: A benchmark for multi object tracking in crowded scenes. CoRR abs/2003.09003 (2020).Google Scholar
- BEWLEY, A., GE, Z., OTT, L., RAMOS, F. T., AND UPCROFT, B. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016 (2016), IEEE, pp. 3464–3468.Google ScholarCross Ref
- BERGMANN, P., MEINHARDT, T., AND LEAL-TAIXÉ, L. Tracking without bells and whistles. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 941–951.Google ScholarCross Ref
- HE, J., HUANG, Z., WANG, N., AND ZHANG, Z. Learnable graph matching: Incorporating graph partitioning with deep feature learning for multiple object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 5299–5309.Google ScholarCross Ref
- LIANG, T., LAN, L., ZHANG, X., PENG, X., AND LUO, Z. Enhancing the association in multi-object tracking via neighbor graph. Int. J. Intell. Syst. 36, 11 (2021), 6713–6730.Google ScholarDigital Library
- ZHENG, L., TANG, M., CHEN, Y., ZHU, G., WANG, J., AND LU, H. Improving multiple object tracking with single object tracking. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 2453–2462.Google ScholarCross Ref
- ZHU, J., YANG, H., LIU, N., KIM, M., ZHANG, W., AND YANG, M. Online multi-object tracking with dual matching attention networks. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part V (2018), V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11209 of Lecture Notes in Computer Science, Springer, pp. 379–396.Google ScholarDigital Library
- LI, B., YAN, J., WU, W., ZHU, Z., AND HU, X. High performance visual tracking with siamese region proposal network. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018), Computer Vision Foundation / IEEE Computer Society, pp. 8971–8980.Google ScholarCross Ref
- ZHOU, X., KOLTUN, V., AND KRÄHENBÜHL, P. Tracking objects as points. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV (2020), A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12349 of Lecture Notes in Computer Science, Springer, pp. 474–490.Google Scholar
- LIANG, T., LAN, L., ZHANG, X., AND LUO, Z. A generic MOT boosting framework by combining cues from sot, tracklet and re-identification. Knowl. Inf. Syst. 63, 8 (2021), 2109–2127.Google ScholarDigital Library
- CHU, Q., OUYANG, W., LI, H., WANG, X., LIU, B., AND YU, N. Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017 (2017), IEEE Computer Society, pp. 4846–4855.Google ScholarCross Ref
- DOSOVITSKIY, A., BEYER, L., KOLESNIKOV, A., WEISSENBORN, D., ZHAI, X., UNTERTHINER, T., DEHGHANI, M., MINDERER, M., HEIGOLD, G., GELLY, S., USZKOREIT, J., AND HOULSBY, N. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (2021), OpenReview.net.Google Scholar
- CARION, N., MASSA, F., SYNNAEVE, G., USUNIER, N., KIRILLOV, A., AND ZAGORUYKO, S. End-to-end object detection with transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I (2020), A. Vedaldi, H. Bischof, T. Brox, and J. Frahm, Eds., vol. 12346 of Lecture Notes in Computer Science, Springer, pp. 213–229.Google ScholarDigital Library
- SUN, P., JIANG, Y., ZHANG, R., XIE, E., CAO, J., HU, X., KONG, T., YUAN, Z., WANG, C., AND LUO, P. Transtrack: Multiple-object tracking with transformer. CoRR abs/2012.15460 (2020).Google Scholar
- XU, Y., BAN, Y., DELORME, G., GAN, C., RUS, D., AND ALAMEDA-PINEDA, X. Transcenter: Transformers with dense queries for multiple-object tracking. CoRR abs/2103.15145 (2021).Google Scholar
- CUI, Y., JIANG, C., WANG, L., AND WU, G. Target transformed regression for accurate tracking. CoRR abs/2104.00403 (2021).Google Scholar
- XING, D., EVANGELIOU, N., TSOUKALAS, A., AND TZES, A. Siamese transformer pyramid networks for real-time UAV tracking. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022 (2022), IEEE, pp. 1898–1907.Google ScholarCross Ref
- GUO, D., WANG, J., CUI, Y., WANG, Z., AND CHEN, S. Siamcar: Siamese fully convolutional classification and regression for visual tracking. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020 (2020), Computer Vision Foundation / IEEE, pp. 6268–6276.Google ScholarCross Ref
- XU, Y., BAN, Y., ALAMEDA-PINEDA, X., AND HORAUD, R. Deepmot: A differentiable framework for training multiple object trackers. CoRR abs/1906.06618 (2019).Google Scholar
- GUO, S., WANG, J., WANG, X., AND TAO, D. Online multiple object tracking with cross-task synergy. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 8136–8145.Google ScholarCross Ref
- STADLER, D., AND BEYERER, J. Improving multiple pedestrian tracking by track management and occlusion handling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021), Computer Vision Foundation / IEEE, pp. 10958–10967.Google ScholarCross Ref
- CHU, P., AND LING, H. Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019 (2019), IEEE, pp. 6171–6180.Google ScholarCross Ref
- FENG, W., HU, Z., WU, W., YAN, J., AND OUYANG, W. Multi-object tracking with multiple cues and switcher-aware classification. CoRR abs/1901.06129 (2019).Google Scholar
Index Terms
- DSGA: Distractor-Suppressing Graph Attention for Multi-object Tracking
Recommendations
Siamese Network for Underwater Multiple Object Tracking
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and ComputingFor underwater videos, the performance of object tracking is greatly affected by illumination changes, background disturbances and occlusion. Hence, there is a need to have a robust function that computes image similarity, to accurately track the moving ...
Robust object tracking via multi-cue fusion
A long-term object tracking method based on calibrated binocular cameras by fusing information of the two channels and binocular geometry constraints is proposed.The stereo filter which is built based on the epipolar geometry of the binocular cameras is ...
Multi-object detection and tracking by stereo vision
This paper presents a new stereo vision-based model for multi-object detection and tracking in surveillance systems. Unlike most existing monocular camera-based systems, a stereo vision system is constructed in our model to overcome the problems of ...
Comments