Abstract
Multi-object tracking (MOT) is one of the most challenging tasks in the field of computer vision. Although many MOT methods have been proposed in the literature, most of them cannot achieve real-time processing performance, especially those running on embedded platforms with limited computing resources. In this paper, we propose a real-time lightweight MOT method based on MobileNet to effectively improve the MOT processing speed. The proposed tracking method consists of a lightweight MOT model and a post-processing module. In the design of the lightweight MOT model, we have enhanced the lightweight object detection model proposed in our previous work by adding an appearance embedding layer. Moreover, we have also proposed a new anchor box design and a novel feature pyramid network (FPN) to improve the tracking accuracy of the proposed method. In the post-processing module, we have proposed a simple filtering method to replace the Kalman filter used in data association processing to accelerate the processing speed. Experimental results show that the proposed MOT method can reach to high processing speeds of 50.5 Frame-Per-Second (FPS) and 12.6 FPS when running on a desktop computer and an embedded platform, respectively. Moreover, the proposed MOT method also provides a competitive tracking performance when compared with the existing MOT methods. These advantages make the proposed method suitable for many applications running on embedded platforms, such as visual surveillance, visual tracking control of mobile robots, human-robot interaction, etc.
Similar content being viewed by others
References
Ahmed S, Huda MN, Rajbhandari S, Saha C, Elshaw M, Kanarachos S (2019) Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey. Appl. Sci. 9(11)
Basar T. (2001) A New Approach to Linear Filtering and Prediction Problems. Control Theory: Twenty-Five Seminal Papers (1932–1981), Wiley-IEEE Press, pp.167–179
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metric. EURASIP J Image and Video Process 2008(1):246309–246310
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B. (2016) Simple Online and Realtime Tracking. IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, pp. 3464–3468
Bewley A, Ott L, Ramos F, Upcroft B (2016) ALExTRAC: affinity learning by exploring temporal reinforcement within association chains. IEEE International Conference on Robotics and Automation, Stockholm, Sweden
Chao, P., Kao, C., Ruan, Y., Huang, C., Lin, Y. (2019) HarDNet: A Low Memory Traffic Network. IEEE International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3551–3560
Chen, L., Ai, H., Zhuang, Z., Shang, C. (2018) Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification. IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, pp. 1–6
Chiu, Y.-C., Tsai, C.-Y., Ruan, M.-D., Shen , G.-Y., Lee, T.-T. (2020) Mobilenet-SSDv2: An Improved Object Detection Model for Embedded Systems. International Conference on System Science and Engineering (ICSSE), Kagawa, Japan
Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., Yu, N. (2017) Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism. IEEE International Conference on Computer Vision (ICCV), Venice, pp. 4846–4855
Dollar, P., Wojek, C., Schiele, B., Perona, P. (2009) Pedestrian Detection: A Benchmark. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, pp. 304–311
Ess A, Leibe B, Schindler K, Gool LV (2008) A Mobile vision system for robust multi-person tracking. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA
Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2014) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111:98–136
Fang, K., Xiang, Y., Li, X., Savarese, S. (2018) Recurrent Autoregressive Networks for Online Multi-object Tracking. IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, pp. 466–475
Frame Rate Guide for Video Surveillance (By IPVM Team, Published Jan 18, 2021): https://ipvm.com/reports/frame-rate-surveillance-guide
Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D. (2018) Detect-and-Track: Efficient Pose Estimation in Videos. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 350–359
Gόmez-Huélamo, C., Egido, J. D., Bergasa, L. M., Barea, R., Qcaña, M., Arango, F., Gutiérrez-Moreno, R. (2020) Real-Time Bird’s Eye View Multi-Object Tracking System Based on Fast Encoders for Object Detection. IEEE 23rd International Conference on Intelligent Transportation Systems, Rhodes, Greece
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C. (2020) GhostNet: More Features From Cheap Operations. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 1577–1586
He, K., Zhang, X., Ren, S., Sun, J. (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 770–778
Hosang, J., Benenson, R., Schiele, B. (2017) Learning Non-maximum Suppression. Computer Vision and Pattern Recognition, arXiv:1705.02950v2
Hosang, J., Benenson, R., Schiele, B. (2017) Learning Non-maximum Suppression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6469–6477
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Computer Vision and Pattern Recognition, arXiv:1704.04861v1
Hu W, Li X, Luo W, Zhang X, Maybank S, Zhang Z (2012) Single and multiple object tracking using log-Euclidean Riemannian subspace and block-division appearance model. IEEE Trans Pattern Anal Mach Intell 34(12):2420–2440
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K. Q. (2017) Densely Connected Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261–2269
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., Keutzer, K. (2016) SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size. International Conference on Learning Representations
Kalake L, Wan W, Hou L (2021) Analysis based on recent deep learning approaches applied in real-time multi-object tracking: a review. IEEE Access 9:32650–32671
Kim, C., Li, F., Ciptadi, A., Rehg, J. M. (2015) Multiple Hypothesis Tracking Revisited. IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 4696–4704
Kuhn, H.W. (1955) The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly, pp. 83–97
Lee J, Kim S, Ko BC (2020) Online multiple object tracking using rule distillated Siamese random Forest. IEEE Assess 8:182828–182841
Li, Y., Huang, C., Nevatia, R. (2009) Learning to Associate: HybridBoosted Multi-Target Tracker for Crowded Scene. IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, pp. 2953–2960
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C. (2016) SSD: Single Shot Multibox Detector. European Conference on Computer Vision, Amsterdam, Netherlands, pp. 21–37
Ma, N., Zhang, X., Zheng, H. T., Sun, J. (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. European Conference on Computer Vision
Milan, A., Taixe, L. L., Reid, I., Roth, S., Schindler, K.(2016) MOT16: A Benchmark for Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:1603.00831v2
MobileJDE Results: https://motchallenge.net/method/MOT=3378&chl=5 (n.d.)
MobileJDE_SF Results: https://motchallenge.net/method/MOT=3614&chl=5 (n.d.)
MOT16 Results: https://motchallenge.net/results/MOT16/?det=Private (n.d.)
Redmon, J., Farhadi, A. (2018) YOLOv3: An Incremental Improvement. Computer Vision and Pattern Recognition, arXiv:1804.02767v1
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., Tomasi, C. (2016) Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. European Conference on Computer Vision, pp.17–35
Sanchez-Matilla, R., Poiesi, F., Cavallaro, A. (2016) Online Multi-Target Tracking with Strong and Weak Detections. European Conference on Computer Vision, pp.84–99
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 4510–4520
Simonyan, K., Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Vision and Pattern Recognition, arXiv:140931556v6
Tan, M., Pang, R., Le, Q. V. (2020) EfficientDet: Scalable and Efficient Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M. (2018) A Closer Look at Spatiotemporal Convolutions for Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 6450–6459
Voigtlaender, P., Krause, M., Ošep, A., Luiten, J., Sekar, B.B.G., Geiger, A., Leibe, B. (2019) MOTS: Multi-Object Tracking and Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7934–7943
Wan, X., Wang, J., Kong, Z., Zhao, Q., Deng, S. (2018) Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory. IEEE International Conference on Image Processing (ICIP), Athens, pp. 788–792
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S: (2018) Multiple Object Tracking: A Literature Review. Comp Vision Patt Recogn, arXiv:1409.7618v4
Wang Z, Zheng L, Liu Y, Li Y, Wang S (2019) Towards Real-Time Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:1909.12605v1
Wojke, N., Bewley, A., Paulus, D. (2017) Simple Online and Realtime Tracking with a Deep Association Metric. IEEE International Conference on Image Processing (ICIP), Beijing, pp. 3645–3649
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.(2017) Joint Detection and Identification Feature Learning for Person Search. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 3376–3385
Yang, B., Nevatia, R. (2012) Online learned discriminative partbased appearance models for multi-human tracking. 12th European Conference Computer Vision, pp. 484–498.
Yang, M., Yu, T., Wu, Y. (2007) Game-Theoretic Multiple Target Tracking. IEEE 11th International Conference on Computer Vision, Rio de Janeiro, pp. 1–8
Yoon, J. H., Yang, M., Lim, J., Yoon, K. (2015) Bayesian Multi-object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, pp. 33–40
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J. (2016) POI: Multiple Object Tracking with High Performance Detection and Appearance Feature. European Conference on Computer Vision, pp.36–42
Zhang, L., van der Maaten, L. (2013) Structure Preserving Object Tracking. IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, pp. 1838–1845
Zhang L, van der Maaten L (2014) Preserving structure in model-free tracking. IEEE Trans Pattern Anal Mach Intell 36(4):756–769
Zhang, S., Benenson, R., Schiele, B.: CityPersons (2017) A Diverse Dataset for Pedestrian Detection. Computer Vision and Pattern Recognition, arXiv:1702.05693v1
Zhang, X., Zhou, X., Lin , M., Sun, J. (2018) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 6848–6856
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A Simple Baseline for Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:2004.01888v4 (2020)
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang , Y., Tian, Q. (2017) Person Re-identification in the Wild. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 3346–3355
Zhou, Z., Xing, J., Zhang, M., Hu, W. (2018) Online Multi-Target Tracking with Tensor-Based High-Order Graph Matching. International Conference on Pattern Recognition (ICPR), Beijing, pp. 1809–1814
Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H. (2018) Online Multi-Object Tracking with Dual Matching Attention Networks. 15th European Conference on Computer Vision (ECCV), Munich, Germany, pp. 379–396
Acknowledgments
The authors sincerely thank Professor Humaira Nisar from the Department of Electronics Engineering of Universiti Tunku Abdul Rahman, Malaysia, for participating in the revision of the manuscript. This research was supported by the Ministry of Science and Technology of Taiwan under Grant MOST 110-2221-E-032-047 and Grant MOST 109-2221-E-032-039.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tsai, CY., Su, YK. MobileNet-JDE: a lightweight multi-object tracking model for embedded systems. Multimed Tools Appl 81, 9915–9937 (2022). https://doi.org/10.1007/s11042-022-12095-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12095-9