Abstract
With the recent advances in the object detection research field, tracking-by-detection has become the leading paradigm adopted by multi-object tracking algorithms. By extracting different features from detected objects, those algorithms can estimate the similarities and association patterns of objects along with successive frames. However, since similarity functions applied by tracking algorithms are handcrafted, it is difficult to use them in new contexts. In this study, it is investigated the use of artificial neural networks to learning a similarity function that can be used among detections. During training, multilayer perceptron (MLP) neural networks were introduced to correct and incorrect association patterns, sampled from a pedestrian tracking data set. For such, different motion and appearance feature combinations have been explored. Finally, a trained MLP has been inserted into a multiple-object tracking framework, which has been assessed on the MOT Challenge benchmark. Throughout the experiments, the proposed tracker matched the results obtained by state-of-the-art methods by scoring a tracking accuracy of 60.4%, while running 58% faster than DeepSORT, a recent and similar method used as a baseline. After all, this work demonstrates its method can be automatically trained for different tracking contexts and it has highly competitive cost-effectiveness for online real-time tracking applications.
Similar content being viewed by others
References
Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–971. IEEE (2016)
Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. arXiv:abs/1611.01491 (2017)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 1–10 (2008)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Bochinski, E., Eiselein, V., Sikora, T.: High-Speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
Cun, Y.L.: A theoretical framework for back-propagation. In: D. Touretzky, G. Hinton, T. Sejnowski (eds.) Proceedings of the 1988 Connectionist Models Summer School, CMU, Pittsburg, PA, pp. 21–28. Morgan Kaufmann (1988)
Fang, K., Xiang, Y., Li, X., Savarese, S.: Recurrent autoregressive networks for online multi-object tracking. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 466–475 (2018)
Flach, P.: Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press, New York (2012)
Geiger, A., Lauer, M., Wojek, C., Stiller, C., Urtasun, R.: 3D traffic scene understanding from movable platforms. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 1012–1025 (2014)
Jadhav, A., Mukherjee, P., Kaushik, V., Lall, B.: Aerial multi-object tracking by detection using deep association networks (2019)
Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. 52(1), 7–21 (2005)
Leal-Taixé, L., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S.: Tracking the trackers: an analysis of the state of the art in multiple object tracking. arXiv:abs/1704.02781 (2017)
Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323–338 (2018)
Li Zhang, Yuan Li, Nevatia, R.: Global data association for multi-object tracking using network flows. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Lin, Z., Zheng, H., Ke, B., Chen, L.: Online multi-object tracking based on hierarchical association and sparse representation. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 655–659. IEEE (2017)
Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., Zhao, X., Kim, T.: Multiple object tracking: a literature review. arXiv:abs/1409.7618 (2014)
Mahmoudi, N., Ahadi, S.M., Rahmati, M.: Multi-target tracking using CNN-based features: CNNMTT. Multimed Tools Appl 78, 7077–7096 (2019)
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: Mot16: A benchmark for multi-object tracking. arXiv:abs/1603.00831 (2016)
Muñoz-Salinas, R., Aguirre, E., García-Silvente, M., Gonzalez, A.: A multiple object tracking approach that combines colour and depth information using a confidence measure. Pattern Recognit. Lett. 29(10), 1504–1514 (2008)
Oron, S., Bar-Hillel, A., Avidan, S.: Real-time tracking-with-detection for coping with viewpoint change. Mach. Vis. Appl. 26(4), 507–518 (2015)
Prabhavalkar, R., Alsharif, O., Bruguier, A., McGraw, L.: On the compression of recurrent neural networks with an application to lvcsr acoustic modeling for embedded speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5970–5974 (2016)
Punn, N.S., Sonbhadra, S.K., Agarwal, S.: Monitoring Covid-19 social distancing with person detection and tracking via fine-tuned YOLO v3 and Deepsort techniques. arXiv e-prints (2020)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:abs/1804.02767 (2018)
Martínez-del Rincón, J., Orrite, C., Medrano, C.: Rao-blackwellised particle filter for colour-based tracking. Pattern Recognit. Lett. 32(2), 210–220 (2011)
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 300–311. IEEE (2017)
Sanchez-Matilla, R., Poiesi, F., Cavallaro, A.: Online multi-target tracking with strong and weak detections. In: ECCV Workshops (2016)
Son, J., Baek, M., Cho, M., Han, B.: Multi-object tracking with quadruplet convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3786–3795. IEEE (2017)
Tang, S., Andres, B., Andriluka, M., Schiele, B.: Multi-person tracking by multicut and deep matching. ECCV 2016 Workshops (2016)
Tao, X., Gong, Y., Shi, W., Cheng, D.: Object detection with class aware region proposal network and focused attention objective. Pattern Recognit. Lett. https://doi.org/10.1016/j.patrec.2018.09.025.
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Yang, Y., Liang, K., Xiao, X., Xie, Z., Jin, L., Sun, J., Zhou, W.: Accelerating and compressing LSTM based model for online handwritten Chinese character recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 110–115 (2018)
Yoon, J.H., Lee, C.R., Yang, M.H., Yoon, K.J.: Online multi-object tracking via structural constraint event aggregation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1392–1400. IEEE (2016)
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J.: Poi: multiple object tracking with high performance detection and appearance feature. In: ECCV 2016 Workshops (2016)
Yu, H., Yu, H., Guo, H., Simmons, J., Zou, Q., Feng, W., Wang, S.: Multiple human tracking in wearable camera videos with informationless intervals. Pattern Recognit. Lett. 112, 104–110 (2018)
Acknowledgements
This study has been financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001. This research was carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) funded by FAPESP (Grant 2013/07375-0). The authors thank CNPq for granting a productivity scholarship to Hendrik Macedo [DT-II, Processo 306073/2017-0].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Meneses, M., Matos, L., Prado, B. et al. SmartSORT: an MLP-based method for tracking multiple objects in real-time. J Real-Time Image Proc 18, 913–921 (2021). https://doi.org/10.1007/s11554-020-01054-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-01054-y