Abstract
Internet of things (IoT) is transforming technological evolution in several practical applications. These applications range from smart cities, smart healthcare to intelligent video surveillance, where the primary interest is person monitoring and detection. The amalgamation of Artificial Intelligence (AI) and IoT-based techniques maintain a balance between computational cost and efficiency that is essential for next-generation IoT networks. In this context, a real-time IoT-enabled people detection system is introduced. The developed system performs image processing task over the cloud using an internet connection, thus reduces the computational cost by processing high-resolution images over the cloud. For person detection, a pre-trained Cascade RCNN, a deep learning approach is used. It is an object detection architecture, seeks to address discrediting performance with increased Intersection over Union (IoU) thresholds. As the architecture is pre-trained with COCO data set and the person body’s appearance in overhead perspective is significantly different; thus, additional training is performed to enhance the detection results. Taking advantage of transfer learning architecture is trained for overhead person images, and the newly trained feature layer is added to the existing architecture. Experimental outcomes reveal that additional training increases the detection architecture’s performance with an accuracy rate of 0.96.









Similar content being viewed by others
References
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint. arXiv: 1905.05055 (2019)
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey. arXiv preprint. arXiv: 1904.09172 (2019)
Zhou, S., Ke, M., Qiu, J., Wang, J.: A survey of multi-object video tracking algorithms. In: Abawajy, J., Choo, K.K.R., Islam, R., Xu, Z., Atiquzzaman, M. (eds.) International conference on applications and techniques in cyber security and intelligence ATCI 2018, pp. 351–369. Springer, Cham (2019)
Ahmad, M., Ahmed, I., Khan, F.A., Qayum, F., Aljuaid, H.: Convolutional neural network-based person tracking using overhead views. Int. J. Distrib. Sens. Netw. 16(6), 1550147720934738 (2020)
Ahmed, I., Ahmad, M., Nawaz, M., Haseeb, K., Khan, S., Jeon, G.: Efficient topview person detector using point based transformation and lookup table. Comput. Commun. 147, 188 (2019)
Ahmed, I., Din, S., Jeon, G., Piccialli, F.: Exploring deep learning models for overhead view multiple object detection. IEEE Internet Things J. 7(7), 5737 (2020)
Ahmed, I., Adnan, A.: A robust algorithm for detecting people in overhead views. Clust. Comput. 21(1), 633 (2018). https://doi.org/10.1007/s10586-017-0968-3
Vera, P., Monjaraz, S., Salas, J.: Counting pedestrians with a zenithal arrangement of depth cameras. Mach. Vis. Appl. 27(2), 303 (2016)
Ertler, C., Possegger, H., Opitz, M., Bischof, H.: Pedestrian detection in RGB-D images from an elevated viewpoint. In: Kropatsch, W., Janusch, I., Artner, N. (eds.) Proceedings of the 22nd computer vision winter workshop, TU Wien, pattern recongition and image processing group, Austria (2017)
Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Adnan, A.: Robust background subtraction based person′s counting from overhead view. In 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, pp. 746–752 (2018)
Kristoffersen, M., Dueholm, J., Gade, R., Moeslund, T.: Pedestrian counting with occlusion handling using stereo thermal cameras. Sensors 16(1), 62 (2016)
Burbano, A., Bouaziz, S., Vasiliu, M.: 3D-sensing distributed embedded system for people tracking and counting. In: 2015 International conference on computational science and computational intelligence (CSCI), pp. 470–475 (2015)
Tseng, T., Liu, A., Hsiao, P., Huang, C., Fu, L.: Real-time people detection and tracking for indoor surveillance using multiple top-view depth cameras. In: 2014 IEEE/RSJ international conference on intelligent robots and systems, pp. 4077–4082 (2014)
García, J., Gardel, A., Bravo, I., Lázaro, J.L., Martínez, M., Rodríguez, D.: Directional people counter based on head tracking. IEEE Trans. Ind. Electron. 60(9), 3991 (2013)
Ahmed, I., Ahmad, A., Piccialli, F., Sangaiah, A.K., Jeon, G.: A robust features-based person tracker for overhead views in industrial environment. IEEE Internet Things J. 5(3), 1598 (2018)
Rauter, M.: Reliable human detection and tracking in top-view depth images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) workshops, pp. 529–534 (2013)
Ullah, K., Ahmed, I., Ahmad, M., Khan, I.: Comparison of person tracking algorithms using overhead view implemented in OpenCV. In: 2019 9th Annual information technology, electromechanical engineering and microelectronics conference (IEMECON) (IEEE), pp. 284–289 (2019)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162 (2018)
Iguernaissi, R., Merad, D., Drap, P.: People counting based on kinect depth data. In: Proceedings of the 7th international conference on pattern recognition applications and methods—volume 1: ICPRAM. INSTICC (SciTePress), pp. 364–370 (2018). https://doi.org/10.5220/0006585703640370
Perng, J., Wang, T., Hsu, Y., Wu, B.: The design and implementation of a vision-based people counting system in buses. In: 2016 International conference on system science and engineering (ICSSE), pp. 1–3 (2016)
Hsu, T.-W., Yang, Y.-H., Yeh, T.-H., Liu, A.-S., Fu, L.-C., Zeng, Y.-C.: Privacy free indoor action detection system using top-view depth camera based on key-poses. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC), pp. 004058–004063 (2016)
Ahmad, M., Ahmed, I., Ullah, K., Khan, I., Khattak, A., Adnan, A.: Person detection from overhead view: a survey. Int. J. Adv. Comput. Sci. Appl. (2019). https://doi.org/10.14569/IJACSA.2019.0100470
Ozturk, O., Yamasaki, T., Kiyoharu, A.: Tracking of humans and estimation of body/head orientation from top-view single camera for visual focus of attention analysis. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV Workshops, pp. 1020–1027 (2009)
Wu, C.J., Houben, S., Marquardt, N.: EagleSense: tracking people and devices in interactive spaces using real-time top-view depth-sensing. In: Proceedings of the 2017 CHI conference on human factors in computing systems (Association for Computing Machinery, New York, NY, USA), CHI ’17, pp. 3929–3942 (2017). https://doi.org/10.1145/3025453.3025562
Wetzel, J., Laubenheimer, A., Heizmann, M.: Joint probabilistic people detection in overlapping depth images. IEEE Access 8, 28349 (2020)
Van Oosterhout, T., Bakkes, S., Kröse, B.J. et al.: Head detection in stereo data for people counting and segmentation. In: VISAPP, pp. 620–625 (2011)
Wateosot, C., Suvonvorn, N. et al.: Top-view based people counting using mixture of depth and color information. In: The second Asian conference on information systems, ACIS (Citeseer), (2013)
Gao, C., Liu, J., Feng, Q., Lv, J.: People-flow counting in complex environments by combining depth and color information. Multimed. Tools Appl. 75(15), 9315 (2016). https://doi.org/10.1007/s11042-016-3344-z
Mukherjee, S., Saha, B., Jamal, I., Leclerc, R., Ray, N.: Anovel framework for automatic passenger counting. In: 2011 18th IEEE international conference on image processing, pp. 2969–2972 (2011)
Nakatani, R., Kouno, D., Shimada, K., Endo, T.: A person identification method using a top-view head image from an overhead camera. JACIII 16(6), 696 (2012)
Velipasalar, S., Tian, Y., Hampapur, A.: Automatic counting of interacting people by using a single uncalibrated camera. In: 2006 IEEE international conference on multimedia and expo, pp. 1265–1268 (2006)
Yu, S., Chen, X., Sun, W., Xie, D.: A robust method for detecting and counting people. In: 2008 International conference on audio, language and image processing, pp. 1545–1549 (2008)
Yahiaoui, T., Meurie, C., Khoudour, L., Cabestaing, F.: A people counting system based on dense and close stereovision. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) Image Signal Process., pp. 59–66. Springer, Berlin (2008)
Cao, J., Sun, L., Odoom, M.G., Luan, F., Song, X.: Counting people by using a single camera without calibration. In: 2016 Chinese control and decision conference (CCDC), pp. 2048–2051 (2016)
Ahmed, I., Carter, J.N.: A robust person detector for overhead views. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp. 1483–1486 (2012)
Choi, T.W., Kim, D.H., Kim, K.H.: Human detection in top-view depth image. Contemp. Eng. Sci. 9(11), 547 (2016)
Pang, Y., Yuan, Y., Li, X., Pan, J.: Efficient HOG human detection. Signal Process. 91(4), 773 (2011)
Ahmed, I., Ahmad, M., Adnan, A., Ahmad, A., Khan, M.: Person detector for different overhead views using machine learning. Int. J. Mach. Learn. Cybern. 10(10), 2657 (2019). https://doi.org/10.1007/s13042-019-00950-5
Ullah, K., Ahmed, I., Ahmad, M., Rahman, A.U., Nawaz, M., Adnan, A.: Rotation invariant person tracker using top view. J. Ambient Intell. Humaniz. Comput., pp. 1–17 (2019)
Migniot, C., Ababsa, F.: Hybrid 3D–2D human tracking in a top view. J. Real Time Image Process. 11(4), 769 (2016)
Ahmad, M., Ahmed, I., Adnan, A.: Overhead view person detection using YOLO. In: 2019 IEEE 10th Annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp. 0627–0633 (2019)
Ahmad, M., Ahmed, I., Ullah, K., Ahmad, M.: A deep neural network approach for top view people detection and counting. In: 2019 IEEE 10th annual ubiquitous computing, electronics mobile communication conference (UEMCON), pp. 1082–1088 (2019)
Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., Tian, Q.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV) (2018)
Zhu, P., Wen, L., Du, D., Bian, X., Ling, H., Hu, Q., Wu, H., Nie, Q., Cheng, H., Liu, C. et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of the European conference on computer vision (ECCV) (2018)
Qi, Y., Zhang, S., Zhang, W., Su, L., Huang, Q., Yang, M.H.: Learning attribute-specific representations for visual tracking. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 8835–8842 (2019)
Ahmed, I., Ahmad, M., Khan, F.A., Asif, M.: Comparison of deep-learning-based segmentation models: using top view person images. IEEE Access 8, 136361–136373 (2020)
Ahmed, I., Din, S., Jeon, G., Piccialli, F., Fortino, G.: Towards collaborative robotics in top view surveillance: a framework for multiple object tracking by detection using deep learning. IEEE/CAA J. Autom. Sin. (2020). https://doi.org/10.1109/JAS.2020.1003453
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer vision—ECCV 2014, pp. 740–755. Springer, Cham (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in neural information processing systems, vol 28, pp. 91–99. Curran Associates Inc. (2015)
Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: 2010 IEEE computer society conference on computer vision and pattern recognition IEEE, pp. 1078–1085 (2010)
Yan, J., Lei, Z., Yi, D., Li, S.: Learn to combine multiple hypotheses for accurate face alignment. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 392–396 (2013)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 1440–1448 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969 (2017)
Acknowledgements
This work was supported under the framework of international cooperation program managed by the National Research Foundation of Korea (2019K1A3A1A8011295711).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahmad, M., Ahmed, I. & Jeon, G. An IoT-enabled real-time overhead view person detection system based on Cascade-RCNN and transfer learning. J Real-Time Image Proc 18, 1129–1139 (2021). https://doi.org/10.1007/s11554-021-01103-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01103-0