Skip to main content
Log in

A real-time person tracking system based on SiamMask network for intelligent video surveillance

  • Special Issue Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Real-time video surveillance systems are widely deployed in various environments, including public areas, commercial buildings, and public infrastructures. Person detection is a key and crucial task in different video surveillance applications, such as person detection, segmentation, and tracking. Researchers presented different image processing and artificial intelligence-based approaches (including machine and deep learning) for person detection and tracking, but mainly comprised of frontal view camera perspective. A real-time person tracking and segmentation system is introduced in this work, using an overhead camera perspective. The system applied a deep learning-based algorithm, i.e., SiamMask, a simple, versatile, fast, and surpassing other real-time tracking algorithms. The algorithm also performs segmentation of the target person by combining a mask branch to the fully convolutional twin neural network for target or person tracking. First, the person video sequences are obtained from an overhead perspective, and then additional training is performed with the help of transfer learning. Finally, a comparison is performed with other tracking algorithms. The SiamMask algorithm delivers good results, with a tracking accuracy of 95%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Choi, J.W., Moon, D., Yoo, J.H.: Robust multi-person tracking for real-time intelligent video surveillance. ETRI J. 37(3), 551 (2015)

    Article  Google Scholar 

  2. Liu, P., Li, X., Liu, H., Fu, Z.: Multidisciplinary Digital Publishing Institute: online learned Siamese network with auto-encoding constraints for robust multi-object tracking. Electronics 8(6), 595 (2019)

    Article  Google Scholar 

  3. Potdar, K., Pai, C.D., Akolkar, S.: A convolutional neural network based live object recognition system as blind aid (2018). arXiv preprint. arXiv:1811.10399

  4. Vera, P., Monjaraz, S., Salas, J.: Counting pedestrians with a zenithal arrangement of depth cameras. Mach. Vis. Appl. 27(2), 303 (2016)

    Article  Google Scholar 

  5. Ertler, C., Possegger, H., Opitz, M., Bischof, H.: Pedestrian detection in RGB-D images from an elevated viewpoint. In: Kropatsch, W., Janusch, I., Artner, N. (eds.) Proceedings of the 22nd Computer Vision Winter Workshop. TU Wien, Pattern Recognition and Image Processing Group, Vienna (2017)

  6. Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Adnan, A.: Robust background subtraction based person’s counting from overhead view. In: 9th IEEE Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 746–752 (2018)

  7. Ahmed, I., Ahmad, M., Ahmad, A., Jeon, G.: Top view multiple people tracking by detection using deep SORT and YOLOv3 with transfer learning: within 5G infrastructure. Int. J. Mach. Learn. Cybern. 1–15 (2020)

  8. Nguyen, D.T., Li, W., Ogunbona, P.O.: Human detection from images and videos: a survey. Pattern Recognit. 51, 148 (2016)

    Article  Google Scholar 

  9. Buongiorno, A., Trotta, G.F., Bevilacqua, V.: Computer vision and deep learning techniques for pedestrian detection and tracking: a survey. Neurocomputing 300, 17 (2018)

    Article  Google Scholar 

  10. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey (2019). arXiv preprint arXiv:1905.05055

  11. Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey (2019). arXiv preprint. arXiv:1904.09172

  12. Zhou, S., Ke, M., Qiu, J., Wang, J.: A survey of multi-object video tracking algorithms. In: Abawajy, J., Choo, K.K.R., Islam, R., Xu, Z., Atiquzzaman, M. (eds.) International Conference on Applications and Techniques. Cyber Security and Intelligence ATCI 2018, pp. 351–369. Springer International Publishing, Cham (2019)

  13. Li, P., Wang, D., Wang, L., Lu, H.: Deep visual tracking: review and experimental comparison. Pattern Recognit. 76, 323 (2018)

    Article  Google Scholar 

  14. Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633 (2018). https://doi.org/10.1007/s10586-017-0968-3

    Article  MathSciNet  Google Scholar 

  15. Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real Time Image Process. 11(4), 769 (2016)

    Article  Google Scholar 

  16. Ahmad, M., Ahmed, I., Khan, F.A., Qayum, F., Aljuaid, H.: Convolutional neural network-based person tracking using overhead views. Int. J. Distrib. Sens. Netw. 16(6), 1550147720934738 (2020)

    Article  Google Scholar 

  17. Ahmed, I., Ahmad, M., Nawaz, M., Haseeb, K., Khan, S., Jeon, G.: Efficient topview person detector using point based transformation and lookup table. Comput. Commun. 147, 188 (2019)

    Article  Google Scholar 

  18. Ahmed, I., Din, S., Jeon, G., Piccialli, F.: Exploring deep learning models for overhead view multiple object detection. IEEE Internet Things J. 7(7), 5737 (2020)

    Article  Google Scholar 

  19. Kristoffersen, M., Dueholm, J., Gade, R., Moeslund, T.: Pedestrian counting with occlusion handling using stereo thermal cameras. Sensors 16(1), 62 (2016)

    Article  Google Scholar 

  20. Burbano, A., Bouaziz, S., Vasiliu, M.: 3D-sensing distributed embedded system for people tracking and counting. In: 2015 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 470–475 (2015)

  21. Tseng, T., Liu, A., Hsiao, P., Huang, C., Fu, L.: Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4077–4082 (2014)

  22. García, J., Gardel, A., Bravo, I., Lázaro, J.L., Martínez, M., Rodríguez, D.: Directional people counter based on head tracking. IEEE Trans. Ind. Electron. 60(9), 3991 (2013)

    Article  Google Scholar 

  23. Ahmed, I., Ahmad, A., Piccialli, F., Sangaiah, A.K., Jeon, G.: A robust features-based person tracker for overhead views in industrial environment. IEEE Internet Things J. 5(3), 1598 (2018)

    Article  Google Scholar 

  24. Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 529–534 (2013)

  25. Ryan, D., Denman, S., Sridharan, S., Fookes, C.: An evaluation of crowd counting methods, features and regression models. Comput. Vis. Image Underst. 130, 1 (2015)

    Article  Google Scholar 

  26. Lin, Q., Zhou, C., Wang, S., Xu, X.: Human behavior understanding via top-view vision. AASRI Procedia 3, 184 (2012)

    Article  Google Scholar 

  27. Hsu, T.-W., Yang, Y.-H., Yeh, T.-H., Liu, A.-S., Fu, L.-C., Zeng, Y.-C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 004058–004063 (2016)

  28. Nakatani, R., Kouno, D., Shimada, K., Endo, T.: A person identification method using a top-view head image from an overhead camera. JACIII 16(6), 696 (2012)

    Article  Google Scholar 

  29. Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Energy efficient camera solution for video surveillance. Int. J. Adv. Comput. Sci. Appl. 10(3) (2019). http://dx.doi.org/10.14569/IJACSA.2019.0100367

  30. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)

  31. Iguernaissi, R., Merad, D., Drap, P.: People counting based on kinect depth data. In: Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods—Volume 1: ICPRAM. INSTICC (SciTePress, 2018), pp. 364–370. https://doi.org/10.5220/0006585703640370

  32. Ozturk, O., Yamasaki, T., Aizawa, K.: Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1020–1027 (2009)

  33. Snidaro, L., Micheloni, C., Chiavedale, C.: Video security for ambient intelligence. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 35(1), 133 (2005)

    Article  Google Scholar 

  34. Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan A.: Int. J. Adv. Comput. Sci. Appl. 10(4) (2019). https://doi.org/10.14569/IJACSA.2019.0100470

  35. Gao, C., Liu, J., Feng, Q., Lv, J.: Person detection from overhead view: a survey. Multimedia Tools Appl. 75(15), 9315 (2016). https://doi.org/10.1007/s11042-016-3344-z

    Article  Google Scholar 

  36. Velipasalar, S., Tian, Y., Hampapur A.: Automatic counting of interacting people by using a single uncalibrated camera. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 1265–1268 (2006)

  37. Bagaa, M., Taleb, T., Ksentini, A.: Efficient tracking area management framework for 5G networks. IEEE Trans. Wirel. Commun. 15(6), 4117 (2016)

    Article  Google Scholar 

  38. Yu, S., Chen, X., Sun, W., Xie D.: A robust method for detecting and counting people. In: 2008 International Conference on Audio, Language and Image Processing, pp. 1545–1549 (2008)

  39. Wateosot, C., Suvonvorn, N., et al.: Top-view based people counting using mixture of depth and color information. In: The Second Asian Conference on Information Systems, ACIS (Citeseer, 2013)

  40. Perng, J., Wang, T., Hsu, Y., Wu B.: The design and implementation of a vision-based people counting system in buses. In: 2016 International Conference on System Science and Engineering (ICSSE), pp. 1–3 (2016)

  41. Yahiaoui, T., Meurie, C., Khoudour, L.: A people counting system based on dense and close stereovision. In: Cabestaing, F., Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) Image and Signal Processing, pp. 59–66. Springer, Berlin (2008)

    Chapter  Google Scholar 

  42. Cao, J., Sun, L., Odoom, M.G., Luan, F., Song X.: Counting people by using a single camera without calibration. In: 2016 Chinese Control and Decision Conference (CCDC), pp. 2048–2051 (2016)

  43. Mukherjee, S., Saha, B., Jamal, I., Leclerc, R., Ray N.: Anovel framework for automatic passenger counting. In: 2011 18th IEEE International Conference on Image Processing, pp. 2969–2972 (2011)

  44. Pang, Y., Yuan, Y., Li, X., Pan, J.: Efficient HOG human detection. Signal Process. 91(4), 773 (2011)

    Article  Google Scholar 

  45. Wu, C.J., Houben, S., Marquardt, N.: EagleSense: tracking people and devices in interactive spaces using real-time top-view depth-sensing. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Association for Computing Machinery, New York, NY, USA, 2017). CHI ’17, pp. 3929–3942. https://doi.org/10.1145/3025453.3025562

  46. Wetzel, J., Laubenheimer, A., Heizmann, M.: Joint probabilistic people detection in overlapping depth images. IEEE Access 8, 28349 (2020)

    Article  Google Scholar 

  47. Ahmed, I., Carter, J.N.: A robust person detector for overhead views. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 1483–1486 (2012)

  48. Ahmed, I., Ahmad, M., Adnan, A., Ahmad, A., Khan, M.: Person detector for different overhead views using machine learning. Int. J. Mach. Learn. Cybern. 10(10), 2657 (2019). https://doi.org/10.1007/s13042-019-00950-5

    Article  Google Scholar 

  49. Ullah, K., Ahmed, I., Ahmad, M., Khan, I.: Comparison of person tracking algorithms using overhead view implemented in OpenCV. In: 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON) (IEEE, 2019), pp. 284–289

  50. Ullah, K., Ahmed, I., Ahmad, M., Rahman, A.U., Nawaz, M., Adnan, A.: Rotation invariant person tracker using top view. J. Ambient Intell. Humaniz. Comput. 1–17 (2019)

  51. Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  52. Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  53. Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., Yang, M.H.: Learning attribute-specific representations for visual tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8835–8842 (2019)

  54. Ahmad, M., Ahmed, I., Adnan, A.: Overhead view person detection using YOLO. In: 2019 IEEE 10th Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 0627–0633 (2019)

  55. Ahmad, M., Ahmed, I., Ullah, K., Ahmad, M.: A deep neural network approach for top view people detection and counting. In: IEEE 10th Annual Ubiquitous Computing. Electronics Mobile Communication Conference (UEMCON), pp. 1082–1088 (2019)

  56. Ahmed, I., Ahmad, M., Khan, F.A., Asif, M.: Comparison of deep-learning-based segmentation models: using top view person images. IEEE Access 8, 136361 (2020)

    Article  Google Scholar 

  57. Ahmed, I., Din, S., Jeon, G., Piccialli, F., Fortino, G.: Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sin. 1–18 (2020). https://doi.org/10.1109/JAS.2020.1003453

  58. Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P.H., Vedaldi, A.: Learning feed-forward one-shot learners (2016). arXiv preprint arXiv:1606.05233

  59. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8971–8980(2018)

  60. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional Siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850–865. Springer, Berlin (2016)

Download references

Acknowledgements

This work was supported by Incheon National University Research Concentration Professors Grant in 2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gwanggil Jeon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, I., Jeon, G. A real-time person tracking system based on SiamMask network for intelligent video surveillance. J Real-Time Image Proc 18, 1803–1814 (2021). https://doi.org/10.1007/s11554-021-01144-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01144-5

Keywords

Navigation