Skip to main content
Log in

MobileNet-JDE: a lightweight multi-object tracking model for embedded systems

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multi-object tracking (MOT) is one of the most challenging tasks in the field of computer vision. Although many MOT methods have been proposed in the literature, most of them cannot achieve real-time processing performance, especially those running on embedded platforms with limited computing resources. In this paper, we propose a real-time lightweight MOT method based on MobileNet to effectively improve the MOT processing speed. The proposed tracking method consists of a lightweight MOT model and a post-processing module. In the design of the lightweight MOT model, we have enhanced the lightweight object detection model proposed in our previous work by adding an appearance embedding layer. Moreover, we have also proposed a new anchor box design and a novel feature pyramid network (FPN) to improve the tracking accuracy of the proposed method. In the post-processing module, we have proposed a simple filtering method to replace the Kalman filter used in data association processing to accelerate the processing speed. Experimental results show that the proposed MOT method can reach to high processing speeds of 50.5 Frame-Per-Second (FPS) and 12.6 FPS when running on a desktop computer and an embedded platform, respectively. Moreover, the proposed MOT method also provides a competitive tracking performance when compared with the existing MOT methods. These advantages make the proposed method suitable for many applications running on embedded platforms, such as visual surveillance, visual tracking control of mobile robots, human-robot interaction, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ahmed S, Huda MN, Rajbhandari S, Saha C, Elshaw M, Kanarachos S (2019) Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey. Appl. Sci. 9(11)

  2. Basar T. (2001) A New Approach to Linear Filtering and Prediction Problems. Control Theory: Twenty-Five Seminal Papers (1932–1981), Wiley-IEEE Press, pp.167–179

  3. Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metric. EURASIP J Image and Video Process 2008(1):246309–246310

    Google Scholar 

  4. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B. (2016) Simple Online and Realtime Tracking. IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, pp. 3464–3468

  5. Bewley A, Ott L, Ramos F, Upcroft B (2016) ALExTRAC: affinity learning by exploring temporal reinforcement within association chains. IEEE International Conference on Robotics and Automation, Stockholm, Sweden

    Google Scholar 

  6. Chao, P., Kao, C., Ruan, Y., Huang, C., Lin, Y. (2019) HarDNet: A Low Memory Traffic Network. IEEE International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 3551–3560

  7. Chen, L., Ai, H., Zhuang, Z., Shang, C. (2018) Real-Time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification. IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, pp. 1–6

  8. Chiu, Y.-C., Tsai, C.-Y., Ruan, M.-D., Shen , G.-Y., Lee, T.-T. (2020) Mobilenet-SSDv2: An Improved Object Detection Model for Embedded Systems. International Conference on System Science and Engineering (ICSSE), Kagawa, Japan

  9. Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., Yu, N. (2017) Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism. IEEE International Conference on Computer Vision (ICCV), Venice, pp. 4846–4855

  10. Dollar, P., Wojek, C., Schiele, B., Perona, P. (2009) Pedestrian Detection: A Benchmark. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, pp. 304–311

  11. Ess A, Leibe B, Schindler K, Gool LV (2008) A Mobile vision system for robust multi-person tracking. IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA

    Book  Google Scholar 

  12. Everingham M, Eslami SMA, Gool LV, Williams CKI, Winn J, Zisserman A (2014) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111:98–136

    Article  Google Scholar 

  13. Fang, K., Xiang, Y., Li, X., Savarese, S. (2018) Recurrent Autoregressive Networks for Online Multi-object Tracking. IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, pp. 466–475

  14. Frame Rate Guide for Video Surveillance (By IPVM Team, Published Jan 18, 2021): https://ipvm.com/reports/frame-rate-surveillance-guide

  15. Girdhar, R., Gkioxari, G., Torresani, L., Paluri, M., Tran, D. (2018) Detect-and-Track: Efficient Pose Estimation in Videos. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 350–359

  16. Gόmez-Huélamo, C., Egido, J. D., Bergasa, L. M., Barea, R., Qcaña, M., Arango, F., Gutiérrez-Moreno, R. (2020) Real-Time Bird’s Eye View Multi-Object Tracking System Based on Fast Encoders for Object Detection. IEEE 23rd International Conference on Intelligent Transportation Systems, Rhodes, Greece

  17. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C. (2020) GhostNet: More Features From Cheap Operations. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 1577–1586

  18. He, K., Zhang, X., Ren, S., Sun, J. (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, pp. 770–778

  19. Hosang, J., Benenson, R., Schiele, B. (2017) Learning Non-maximum Suppression. Computer Vision and Pattern Recognition, arXiv:1705.02950v2

  20. Hosang, J., Benenson, R., Schiele, B. (2017) Learning Non-maximum Suppression. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 6469–6477

  21. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H. (2017) MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Computer Vision and Pattern Recognition, arXiv:1704.04861v1

  22. Hu W, Li X, Luo W, Zhang X, Maybank S, Zhang Z (2012) Single and multiple object tracking using log-Euclidean Riemannian subspace and block-division appearance model. IEEE Trans Pattern Anal Mach Intell 34(12):2420–2440

    Article  Google Scholar 

  23. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K. Q. (2017) Densely Connected Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2261–2269

  24. Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., Keutzer, K. (2016) SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size. International Conference on Learning Representations

  25. Kalake L, Wan W, Hou L (2021) Analysis based on recent deep learning approaches applied in real-time multi-object tracking: a review. IEEE Access 9:32650–32671

    Article  Google Scholar 

  26. Kim, C., Li, F., Ciptadi, A., Rehg, J. M. (2015) Multiple Hypothesis Tracking Revisited. IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 4696–4704

  27. Kuhn, H.W. (1955) The Hungarian Method for the Assignment Problem. Naval Research Logistics Quarterly, pp. 83–97

  28. Lee J, Kim S, Ko BC (2020) Online multiple object tracking using rule distillated Siamese random Forest. IEEE Assess 8:182828–182841

    Google Scholar 

  29. Li, Y., Huang, C., Nevatia, R. (2009) Learning to Associate: HybridBoosted Multi-Target Tracker for Crowded Scene. IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, pp. 2953–2960

  30. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C. (2016) SSD: Single Shot Multibox Detector. European Conference on Computer Vision, Amsterdam, Netherlands, pp. 21–37

  31. Ma, N., Zhang, X., Zheng, H. T., Sun, J. (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. European Conference on Computer Vision

  32. Milan, A., Taixe, L. L., Reid, I., Roth, S., Schindler, K.(2016) MOT16: A Benchmark for Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:1603.00831v2

  33. MobileJDE Results: https://motchallenge.net/method/MOT=3378&chl=5 (n.d.)

  34. MobileJDE_SF Results: https://motchallenge.net/method/MOT=3614&chl=5 (n.d.)

  35. MOT16 Results: https://motchallenge.net/results/MOT16/?det=Private (n.d.)

  36. Redmon, J., Farhadi, A. (2018) YOLOv3: An Incremental Improvement. Computer Vision and Pattern Recognition, arXiv:1804.02767v1

  37. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  38. Ristani, E., Solera, F., Zou, R. S., Cucchiara, R., Tomasi, C. (2016) Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. European Conference on Computer Vision, pp.17–35

  39. Sanchez-Matilla, R., Poiesi, F., Cavallaro, A. (2016) Online Multi-Target Tracking with Strong and Weak Detections. European Conference on Computer Vision, pp.84–99

  40. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. (2018) MobileNetV2: Inverted Residuals and Linear Bottlenecks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 4510–4520

  41. Simonyan, K., Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Vision and Pattern Recognition, arXiv:140931556v6

  42. Tan, M., Pang, R., Le, Q. V. (2020) EfficientDet: Scalable and Efficient Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790

  43. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M. (2018) A Closer Look at Spatiotemporal Convolutions for Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 6450–6459

  44. Voigtlaender, P., Krause, M., Ošep, A., Luiten, J., Sekar, B.B.G., Geiger, A., Leibe, B. (2019) MOTS: Multi-Object Tracking and Segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7934–7943

  45. Wan, X., Wang, J., Kong, Z., Zhao, Q., Deng, S. (2018) Multi-Object Tracking Using Online Metric Learning with Long Short-Term Memory. IEEE International Conference on Image Processing (ICIP), Athens, pp. 788–792

  46. Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S: (2018) Multiple Object Tracking: A Literature Review. Comp Vision Patt Recogn, arXiv:1409.7618v4

  47. Wang Z, Zheng L, Liu Y, Li Y, Wang S (2019) Towards Real-Time Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:1909.12605v1

  48. Wojke, N., Bewley, A., Paulus, D. (2017) Simple Online and Realtime Tracking with a Deep Association Metric. IEEE International Conference on Image Processing (ICIP), Beijing, pp. 3645–3649

  49. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.(2017) Joint Detection and Identification Feature Learning for Person Search. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 3376–3385

  50. Yang, B., Nevatia, R. (2012) Online learned discriminative partbased appearance models for multi-human tracking. 12th European Conference Computer Vision, pp. 484–498.

  51. Yang, M., Yu, T., Wu, Y. (2007) Game-Theoretic Multiple Target Tracking. IEEE 11th International Conference on Computer Vision, Rio de Janeiro, pp. 1–8

  52. Yoon, J. H., Yang, M., Lim, J., Yoon, K. (2015) Bayesian Multi-object Tracking Using Motion Context from Multiple Objects. IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, pp. 33–40

  53. Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., Yan, J. (2016) POI: Multiple Object Tracking with High Performance Detection and Appearance Feature. European Conference on Computer Vision, pp.36–42

  54. Zhang, L., van der Maaten, L. (2013) Structure Preserving Object Tracking. IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, pp. 1838–1845

  55. Zhang L, van der Maaten L (2014) Preserving structure in model-free tracking. IEEE Trans Pattern Anal Mach Intell 36(4):756–769

    Article  Google Scholar 

  56. Zhang, S., Benenson, R., Schiele, B.: CityPersons (2017) A Diverse Dataset for Pedestrian Detection. Computer Vision and Pattern Recognition, arXiv:1702.05693v1

  57. Zhang, X., Zhou, X., Lin , M., Sun, J. (2018) ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 6848–6856

  58. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: A Simple Baseline for Multi-Object Tracking. Computer Vision and Pattern Recognition, arXiv:2004.01888v4 (2020)

  59. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, Hawaii, USA

    Google Scholar 

  60. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang , Y., Tian, Q. (2017) Person Re-identification in the Wild. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 3346–3355

  61. Zhou, Z., Xing, J., Zhang, M., Hu, W. (2018) Online Multi-Target Tracking with Tensor-Based High-Order Graph Matching. International Conference on Pattern Recognition (ICPR), Beijing, pp. 1809–1814

  62. Zhu, J., Yang, H., Liu, N., Kim, M., Zhang, W., Yang, M.-H. (2018) Online Multi-Object Tracking with Dual Matching Attention Networks. 15th European Conference on Computer Vision (ECCV), Munich, Germany, pp. 379–396

Download references

Acknowledgments

The authors sincerely thank Professor Humaira Nisar from the Department of Electronics Engineering of Universiti Tunku Abdul Rahman, Malaysia, for participating in the revision of the manuscript. This research was supported by the Ministry of Science and Technology of Taiwan under Grant MOST 110-2221-E-032-047 and Grant MOST 109-2221-E-032-039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Yi Tsai.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsai, CY., Su, YK. MobileNet-JDE: a lightweight multi-object tracking model for embedded systems. Multimed Tools Appl 81, 9915–9937 (2022). https://doi.org/10.1007/s11042-022-12095-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12095-9

Keywords

Navigation