Skip to main content
Log in

An IoT-enabled real-time overhead view person detection system based on Cascade-RCNN and transfer learning

  • Special Issue paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Internet of things (IoT) is transforming technological evolution in several practical applications. These applications range from smart cities, smart healthcare to intelligent video surveillance, where the primary interest is person monitoring and detection. The amalgamation of Artificial Intelligence (AI) and IoT-based techniques maintain a balance between computational cost and efficiency that is essential for next-generation IoT networks. In this context, a real-time IoT-enabled people detection system is introduced. The developed system performs image processing task over the cloud using an internet connection, thus reduces the computational cost by processing high-resolution images over the cloud. For person detection, a pre-trained Cascade RCNN, a deep learning approach is used. It is an object detection architecture, seeks to address discrediting performance with increased Intersection over Union (IoU) thresholds. As the architecture is pre-trained with COCO data set and the person body’s appearance in overhead perspective is significantly different; thus, additional training is performed to enhance the detection results. Taking advantage of transfer learning architecture is trained for overhead person images, and the newly trained feature layer is added to the existing architecture. Experimental outcomes reveal that additional training increases the detection architecture’s performance with an accuracy rate of 0.96.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint. arXiv: 1905.05055 (2019)

  2. Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey. arXiv preprint. arXiv: 1904.09172 (2019)

  3. Zhou, S., Ke, M., Qiu, J., Wang, J.: A survey of multi-object video tracking algorithms. In: Abawajy, J., Choo, K.K.R., Islam, R., Xu, Z., Atiquzzaman, M. (eds.) International conference on applications and techniques in cyber security and intelligence ATCI 2018, pp. 351–369. Springer, Cham (2019)

  4. Ahmad, M., Ahmed, I., Khan, F.A., Qayum, F., Aljuaid, H.: Convolutional neural network-based person tracking using overhead views. Int. J. Distrib. Sens. Netw. 16(6), 1550147720934738 (2020)

    Article  Google Scholar 

  5. Ahmed, I., Ahmad, M., Nawaz, M., Haseeb, K., Khan, S., Jeon, G.: Efficient topview person detector using point based transformation and lookup table. Comput. Commun. 147, 188 (2019)

    Article  Google Scholar 

  6. Ahmed, I., Din, S., Jeon, G., Piccialli, F.: Exploring deep learning models for overhead view multiple object detection. IEEE Internet Things J. 7(7), 5737 (2020)

    Article  Google Scholar 

  7. Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633 (2018). https://doi.org/10.1007/s10586-017-0968-3

    Article  MathSciNet  Google Scholar 

  8. Vera, P., Monjaraz, S., Salas, J.: Counting pedestrians with a zenithal arrangement of depth cameras. Mach. Vis. Appl. 27(2), 303 (2016)

    Article  Google Scholar 

  9. Ertler, C., Possegger, H., Opitz, M., Bischof, H.: Pedestrian detection in RGB-D images from an elevated viewpoint. In: Kropatsch, W., Janusch, I., Artner, N. (eds.) Proceedings of the 22nd computer vision winter workshop, TU Wien, pattern recongition and image processing group, Austria (2017)

  10. Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Adnan, A.: Robust background subtraction based person′s counting from overhead view. In 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, pp. 746–752 (2018)

  11. Kristoffersen, M., Dueholm, J., Gade, R., Moeslund, T.: Pedestrian counting with occlusion handling using stereo thermal cameras. Sensors 16(1), 62 (2016)

    Article  Google Scholar 

  12. Burbano, A., Bouaziz, S., Vasiliu, M.: 3D-sensing distributed embedded system for people tracking and counting. In: 2015 International conference on computational science and computational intelligence (CSCI), pp. 470–475 (2015)

  13. Tseng, T., Liu, A., Hsiao, P., Huang, C., Fu, L.: Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras. In: 2014 IEEE/RSJ international conference on intelligent robots and systems, pp. 4077–4082 (2014)

  14. García, J., Gardel, A., Bravo, I., Lázaro, J.L., Martínez, M., Rodríguez, D.: Directional people counter based on head tracking. IEEE Trans. Ind. Electron. 60(9), 3991 (2013)

    Article  Google Scholar 

  15. Ahmed, I., Ahmad, A., Piccialli, F., Sangaiah, A.K., Jeon, G.: A robust features-based person tracker for overhead views in industrial environment. IEEE Internet Things J. 5(3), 1598 (2018)

    Article  Google Scholar 

  16. Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp. 529–534 (2013)

  17. Ullah, K., Ahmed, I., Ahmad, M., Khan, I.: Comparison of person tracking algorithms using overhead view implemented in OpenCV. In: 2019 9th Annual information technology, electromechanical engineering and microelectronics conference (IEMECON) (IEEE), pp. 284–289 (2019)

  18. Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162 (2018)

  19. Iguernaissi, R., Merad, D., Drap, P.: People counting based on kinect depth data. In: Proceedings of the 7th international conference on pattern recognition applications and methods—volume 1: ICPRAM. INSTICC (SciTePress), pp. 364–370 (2018). https://doi.org/10.5220/0006585703640370

  20. Perng, J., Wang, T., Hsu, Y., Wu, B.: The design and implementation of a vision-based people counting system in buses. In: 2016 International conference on system science and engineering (ICSSE), pp. 1–3 (2016)

  21. Hsu, T.-W., Yang, Y.-H., Yeh, T.-H., Liu, A.-S., Fu, L.-C., Zeng, Y.-C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp. 004058–004063 (2016)

  22. Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Person detection from overhead view: a survey. Int. J. Adv. Comput. Sci. Appl. (2019). https://doi.org/10.14569/IJACSA.2019.0100470

    Article  Google Scholar 

  23. Ozturk, O., Yamasaki, T., Kiyoharu, A.: Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pp. 1020–1027 (2009)

  24. Wu, C.J., Houben, S., Marquardt, N.: EagleSense: tracking people and devices in interactive spaces using real-time top-view depth-sensing. In: Proceedings of the 2017 CHI conference on human factors in computing systems (Association for Computing Machinery, New York, NY, USA), CHI ’17, pp. 3929–3942 (2017). https://doi.org/10.1145/3025453.3025562

  25. Wetzel, J., Laubenheimer, A., Heizmann, M.: Joint probabilistic people detection in overlapping depth images. IEEE Access 8, 28349 (2020)

    Article  Google Scholar 

  26. Van Oosterhout, T., Bakkes, S., Kröse, B.J. et al.: Head detection in stereo data for people counting and segmentation. In: VISAPP, pp. 620–625 (2011)

  27. Wateosot, C., Suvonvorn, N. et al.: Top-view based people counting using mixture of depth and color information. In: The second Asian conference on information systems, ACIS (Citeseer), (2013)

  28. Gao, C., Liu, J., Feng, Q., Lv, J.: People-flow counting in complex environments by combining depth and color information. Multimed. Tools Appl. 75(15), 9315 (2016). https://doi.org/10.1007/s11042-016-3344-z

    Article  Google Scholar 

  29. Mukherjee, S., Saha, B., Jamal, I., Leclerc, R., Ray, N.: Anovel framework for automatic passenger counting. In: 2011 18th IEEE international conference on image processing, pp. 2969–2972 (2011)

  30. Nakatani, R., Kouno, D., Shimada, K., Endo, T.: A person identification method using a top-view head image from an overhead camera. JACIII 16(6), 696 (2012)

    Article  Google Scholar 

  31. Velipasalar, S., Tian, Y., Hampapur, A.: Automatic counting of interacting people by using a single uncalibrated camera. In: 2006 IEEE international conference on multimedia and expo, pp. 1265–1268 (2006)

  32. Yu, S., Chen, X., Sun, W., Xie, D.: A robust method for detecting and counting people. In: 2008 International conference on audio, language and image processing, pp. 1545–1549 (2008)

  33. Yahiaoui, T., Meurie, C., Khoudour, L., Cabestaing, F.: A people counting system based on dense and close stereovision. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) Image Signal Process., pp. 59–66. Springer, Berlin (2008)

    Chapter  Google Scholar 

  34. Cao, J., Sun, L., Odoom, M.G., Luan, F., Song, X.: Counting people by using a single camera without calibration. In: 2016 Chinese control and decision conference (CCDC), pp. 2048–2051 (2016)

  35. Ahmed, I., Carter, J.N.: A robust person detector for overhead views. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp. 1483–1486 (2012)

  36. Choi, T.W., Kim, D.H., Kim, K.H.: Human detection in top-view depth image. Contemp. Eng. Sci. 9(11), 547 (2016)

    Article  Google Scholar 

  37. Pang, Y., Yuan, Y., Li, X., Pan, J.: Efficient HOG human detection. Signal Process. 91(4), 773 (2011)

    Article  Google Scholar 

  38. Ahmed, I., Ahmad, M., Adnan, A., Ahmad, A., Khan, M.: Person detector for different overhead views using machine learning. Int. J. Mach. Learn. Cybern. 10(10), 2657 (2019). https://doi.org/10.1007/s13042-019-00950-5

    Article  Google Scholar 

  39. Ullah, K., Ahmed, I., Ahmad, M., Rahman, A.U., Nawaz, M., Adnan, A.: Rotation invariant person tracker using top view. J. Ambient Intell. Humaniz. Comput., pp. 1–17 (2019)

  40. Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real Time Image Process. 11(4), 769 (2016)

    Article  Google Scholar 

  41. Ahmad, M., Ahmed, I., Adnan, A.: Overhead view person detection using YOLO. In: 2019 IEEE 10th Annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp. 0627–0633 (2019)

  42. Ahmad, M., Ahmed, I., Ullah, K., Ahmad, M.: A deep neural network approach for top view people detection and counting. In: 2019 IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp. 1082–1088 (2019)

  43. Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV) (2018)

  44. Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C. et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of the European conference on computer vision (ECCV) (2018)

  45. Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., Yang, M.H.: Learning attribute-specific representations for visual tracking. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8835–8842 (2019)

  46. Ahmed, I., Ahmad, M., Khan, F.A., Asif, M.: Comparison of deep-learning-based segmentation models: using top view person images. IEEE Access 8, 136361–136373 (2020)

  47. Ahmed, I., Din, S., Jeon, G., Piccialli, F., Fortino, G.: Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sin. (2020). https://doi.org/10.1109/JAS.2020.1003453

    Article  Google Scholar 

  48. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer vision—ECCV 2014, pp. 740–755. Springer, Cham (2014)

    Chapter  Google Scholar 

  49. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in neural information processing systems, vol 28, pp. 91–99. Curran Associates Inc. (2015)

    Google Scholar 

  50. Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: 2010 IEEE computer society conference on computer vision and pattern recognition IEEE, pp. 1078–1085 (2010)

  51. Yan, J., Lei, Z., Yi, D., Li, S.: Learn to combine multiple hypotheses for accurate face alignment. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 392–396 (2013)

  52. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1440–1448 (2015)

  53. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)

Download references

Acknowledgements

This work was supported under the framework of international cooperation program managed by the National Research Foundation of Korea (2019K1A3A1A8011295711).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gwanggil Jeon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmad, M., Ahmed, I. & Jeon, G. An IoT-enabled real-time overhead view person detection system based on Cascade-RCNN and transfer learning. J Real-Time Image Proc 18, 1129–1139 (2021). https://doi.org/10.1007/s11554-021-01103-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-021-01103-0

Keywords

Navigation