Fusion of Modern and Tradition: A Multi-stage-Based Deep Network Approach for Head Detection

Hsu, Fu-Chun; Hung, Chih-Chieh

doi:10.1007/978-3-319-93034-3_32

Fusion of Modern and Tradition: A Multi-stage-Based Deep Network Approach for Head Detection

Fu-Chun Hsu¹⁹ &
Chih-Chieh Hung²⁰

Conference paper
First Online: 19 June 2018

4991 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10937))

Abstract

Detecting humans in video is becoming essential for monitoring crowd behavior. Head detection is proven as a promising way to realize detecting and tracking crowd. In this paper, a novel learning strategy, called Deep Motion Information Network (abbr. as DMIN) is proposed for head detection. The concept of DMIN is to borrow the traditional well-developed head detection approaches which are composed of multiple stages, and then replace each stages in the pipeline into a cascade of sub-deep-networks to simulate the function of each stage. This learning strategy can lead to many benefits such as preventing many trial and error in designing deep networks, achieving global optimization for each stage, and reducing the amount of training dataset needed. The proposed approach is validated using the PETS2009 dataset. The results show the proposed approach can achieve impressive speedup of the process in addition to significant improvement in recall rates. A very high F-score of 85% is achieved using the proposed network that is by far higher than other methods proposed in literature.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_47
Chapter Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet Google Scholar
Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R.: Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1932–1939. IEEE (2009)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC, vol. 2, p. 7 (2010)
Google Scholar
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: British Machine Vision Conference, vol. 2, p. 5 (2009)
Google Scholar
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Article Google Scholar
Dosovitskiy, A., Fischery, P., Ilg, E., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T., et al.: Flownet: Learning optical flow with convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766. IEEE (2015)
Google Scholar
Fragkiadaki, K., Arbelaez, P., Felsen, P., Malik, J.: Learning to segment moving objects in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4083–4090 (2015)
Google Scholar
Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: Hunting objects by cascading deep convolutional layers. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2578–2586 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Google Scholar
Hosang, J., Benenson, R., Dollár, P., Schiele, B.: What makes for effective detection proposals? IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 814–830 (2016)
Article Google Scholar
Hsu, F.C., Gubbi, J., Palaniswami, M.: Head detection using motion features and multi level pyramid architecture. Comput. Vis. Image Underst. 137, 38–49 (2015)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (2008)
Google Scholar
Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2056–2063. IEEE (2013)
Google Scholar
PETS2009: Performance Evaluation of Tracking and Surveillance Dataset (2013)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp. 91–99 (2015)
Google Scholar
Sermanet, P., Kavukcuoglu, K., Chintala, S., Lecun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633 (2013). https://doi.org/10.1109/CVPR.2013.465
Teney, D., Hebert, M.: Learning to extract motion from videos in convolutional neural networks. arXiv preprint arXiv:1601.07532 (2016)
Walk, S., Majer, N., Schindler, K., Schiele, B.: New features and insights for pedestrian detection. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1030–1037. IEEE (2010)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Wojek, C., Walk, S., Schiele, B.: Multi-cue onboard pedestrian detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 794–801. IEEE (2009)
Google Scholar
Zhang, L., Lin, L., Liang, X., He, K.: Is Faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Chapter Google Scholar
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1259–1267 (2016)
Google Scholar

Download references

Acknowledgment

This work was supported by the Ministry of Science and Technology, Taiwan, under Grant MOST 106-2218-E-032-004-MY2.

Author information

Authors and Affiliations

Department of Electric Engineering, University of Melbourne, Melbourne, Australia
Fu-Chun Hsu
Department of Computer Science and Information Engineering, Tamkang University, Taipei, Taiwan
Chih-Chieh Hung

Authors

Fu-Chun Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Chieh Hung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chih-Chieh Hung .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hsu, FC., Hung, CC. (2018). Fusion of Modern and Tradition: A Multi-stage-Based Deep Network Approach for Head Detection. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10937. Springer, Cham. https://doi.org/10.1007/978-3-319-93034-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-93034-3_32
Published: 19 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93033-6
Online ISBN: 978-3-319-93034-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics