ABSTRACT
Head detection is a key problem for automated passenger counting systems. In recent decades, considerable effort has been expended to develop an accurate and reliable head detector. However, head detection is still a challenging task because of problems caused by variations in pose and occlusions. Recently, general object detection algorithms based on convolutional neural networks (CNNs), such as Faster R-CNN, SSD and YOLO, have been successful. However, these algorithms require the use of a Graphics Processing Unit (GPU) for real-time performance. In this study, we focused on developing real-time head detection in an embedded system. Starting with the Tiny-YOLOv3 network, we applied the following strategies to achieve real-time performance in a non-GPU environment. First, we reduced the input image size to 224x224. Second, we added an extra yolo layer to detect smaller heads. Third, we removed batch normalization. Finally, we conducted depthwise separable convolution rather than traditional convolution. Three public datasets, HollywoodHeads, SCUT_HEAD, and CrowdHuman, were exploited to train and test the proposed network, and Average Precision (AP) at Intersection over Unit (IoU) = 0.5 were used to evaluate the tests. Experimental results showed that the proposed network perform better and faster than Tiny-YOLOv3.
- G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.Google Scholar
- D. G. Lowe (1999). Object recognition from local scale-invariant features. IEEE International Conference on Computer vision, 1150--1157.Google ScholarCross Ref
- D. Peng, Z. Sun, Z. Chen, Z. Cai, L. Xie, and L. Jin (2018). Detecting Heads using Feature Refine Net and Cascaded Multi-scale Architecture. IEEE International Conference on Pattern Recognition. 2528--2533.Conference Name:ACM Woodstock conference Conference Short Name:WOODSTOCK'18Google ScholarCross Ref
- G. Chen, X. Cai, H. Han, S. Shan, and X. Chen (2018). HeadNet: pedestrian head detection utilizing body in context. IEEE International Conference on Automatic Face & Gesture Recognition, 556--563.Google ScholarCross Ref
- J. Redmon and A. Farhadi (2017). YOLO9000: better, faster, stronger. Proceedings of the IEEE conference on computer vision and pattern recognition, 7263--7271.Google ScholarCross Ref
- J. Redmon and A. Farhadi (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.Google Scholar
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi (2016). You only look once: unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 779--788.Google ScholarCross Ref
- K. He, X. Zhang, S. Ren, and J. Sun (2016). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 770--778.Google ScholarCross Ref
- M. Saqib, S. D. Khan, N. Sharma, and M. Blumenstein (2018). Person head detection in multiple scales using deep convolutional neural networks. International Joint Conference on Neural Networks, 1--7.Google ScholarCross Ref
- N. Dalal and B. Triggs (2005). Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition, 886--893.Google ScholarDigital Library
- R. Girshick (2015). Fast R-CNN. IEEE International Conference on Computer Vision, 1440--1448.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition, 580--587.Google ScholarDigital Library
- R. Huang, J. Pedoeem, and C. Chen (2018). YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. IEEE International Conference on Big Data (Big Data), 2503--2510.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun (2015). Faster R-CNN:Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 91--99.Google Scholar
- S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun (2018). CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123.Google Scholar
- T. Ahonen, A. Hadid, and M. Pietikainen (2006). Face description with local binary patterns: Application to face recognition. IEEE transactions on pattern analysis and machine intelligence, 28(12), 2037--2041.Google Scholar
- T. H. Vu, A. Osokin, and I. Laptev (2015). Context-aware CNNs for person head detection. IEEE International Conference on Computer Vision. 2893--2901.Google ScholarDigital Library
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg (2016). SSD: Single shot multibox detector. European conference on computer vision, 21--37.Google ScholarCross Ref
Index Terms
- Real-time Head Detection for Automated Passenger Counting in Embedded Systems
Recommendations
Development of head detection and tracking systems for visual surveillance
This paper proposes a technique for the detection of head nod and shake gestures based on eye tracking and head motion decision. The eye tracking step is divided into face detection and eye location. Here, we apply a motion segmentation algorithm that ...
A real-time object detection algorithm for video
AbstractDeep learning technology has been widely used in object detection. Although the deep learning technology greatly improves the accuracy of object detection, we also have the challenge of a high computational time. You Only Look Once (...
Feature Enhancement for Joint Human and Head Detection
Biometric RecognitionAbstractHuman and head detection have been rapidly improved with the development of deep convolutional neural networks. However, these two detection tasks are often studied separately, without taking advantage of the relationship between human and head. ...
Comments