Abstract
In response to the problem that the primary visual features are difficult to effectively address pedestrian detection in complex scenes, we present a method to improve pedestrian detection using a visual attention mechanism with semantic computation. After determining a saliency map with a visual attention mechanism, we can calculate saliency maps for human skin and the human head-shoulders. Using a Laplacian pyramid, the static visual attention model is established to obtain a total saliency map and then complete pedestrian detection. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on the INRIA dataset with 92.78% pedestrian detection accuracy at a very competitive time cost.
Similar content being viewed by others
References
Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 155–162
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: 2015 IEEE international conference on computer vision. IEEE Press, Santiago, pp 3361–3369
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition. IEEE Press, San Diego, pp 886–893
Einhäuser W, Spain M, Perona P (2008) Objects predict fixations better than early saliency. In: Journal of vision. Association for Research in Vision and Ophthalmology, Rockville, pp 1–26
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. In: IEEE transactions on pattern analysis and machine intelligence. IEEE Press, New York, pp 1627–1645
Gajjar V, Khandhediya Y, Gurnani A, Mavani V, Raval MS (2018) ViS-HuD: using visual saliency to improve human detection with convolutional neural networks. In: 2018 IEEE conference on computer vision and pattern recognition workshops. IEEE Press, Salt Lake City, pp 1908–1916
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 545–552
Hoai M, Zisserman A (2014) Talking heads: detecting humans and recognizing their interactions. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE Press, Columbus, pp 875–882
Itti L (2000) Models of bottom-up and top-down visual attention. California Institute of Technology Pasadena, State of California
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. In: IEEE transactions on pattern analysis and machine intelligence. IEEE Press, New York, pp 1254–1259
Jing ZL, Xiao G, Li ZH (2007) Image fusion: theories and applications. Higher Education Press, Beijing
Ketenci S, Gencturk B (2013) Performance analysis in common color spaces of 2D Gaussian color model for skin segmentation. In: Eurocon 2013. IEEE Press, Zagreb, pp 1653–1657
Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE computer society conference on computer vision and pattern recognition. IEEE Press, San Diego, pp 878–885
Li N, Gong Y, Xu J, Gu X, Xu T, Zhou H (2016) Semantic feature-based visual attention model for pedestrian detection. In: Journal of image and graphics. Journal of Image and Graphics, Beijing, pp 723–733
Liu Q, Zhang QZ, Chen WB, Huang ZC (2014) Pedestrian detection based on modeling computation of visual attention. In: Journal of Beijing Information Science & Technology University. Beijing Information Science & Technology University, Beijing, pp 59–65
Lu S, Mahadevan V, Vasconcelos N (2014) Learning optimal seeds for diffusion-based salient object detection. In: 2014 IEEE conference on computer vision and pattern Recognitio-n. IEEE Press, Columbus, pp 2790–2797
Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. In: Future generation com-puter systems. Elsevier, Amsterdam, pp 142–148
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. In: Mobile networks and applications. Springer, New York, pp 368–375
Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) CONet: a Congnitive Ocean network. In: IEEE wireless communications. IEEE Press, Piscataway
Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector ma-chines is efficient. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE Press, Anchorage, pp 1–8
Mao J, Xiao T, Jiang Y, Cao Z (2017) What can help pedestrian detection? In: 2017 IEEE conference on computer vision and pattern recognition. IEEE Press, Honolulu, pp 6034–6043
Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed. In: 2006 IEEE computer society conference on computer vision and pattern recognition. IEEE Press, New York, pp 2049–2056
Shashua A, Gdalyahu Y, Hayun G (2004) Pedestrian detection for driving assistance systems: single-frame classification and system level performance. In: IEEE intelligent vehicles symposium. IEEE Press, Parma, pp 1–6
Wang X, Han TX, Yan S (2009) An HOG-LBP human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision. IEEE Press, Kyoto, pp 32–39
Wang G, Liu Q, Zhang J (2015) Method research on vehicular infrared pedestrian detection based on local features. In: Acta electronica sinica. Acta Electronica Sinica, Beijing, pp 1444–1448
Xu Y, Xu XL, Li CN, Jiang JG (2016) Pedestrian detection combining with SVM Classifi-er and HOG feature extraction. In: Computer engineering. Computer Engineering, Shanghai, pp 56–60
Xu D, Ouyang W, Ricci E, Wang X, Sebe N (2017) Learning cross-modal deep Representat-ions for robust pedestrian detection. In: 2017 IEEE conference on computer vision and pattern recognition. IEEE Press, Honolulu, pp 5363–5371
Zhang S, Bauckhage C, Cremers AB (2014) Informed Haar-like features improve pedestrian detection. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE Press, Columbus, pp 947–954
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: 2017 IEEE conference on computer vision and pattern recognition. I-EEE Press, Honolulu, pp 3213–3221
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: 2017 IEEE international conference on computer vision. IEEE Press, Venice, pp 202–211
Zhang P, Wang D, Lu H, Wang H, Yin B (2017) Learning uncertain convolutional features for accurate saliency detection. In: 2017 IEEE international conference on computer vision. IEEE Press, Venice, pp 212–221
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2018) Towards reaching human performance in pedestrian detection. In: IEEE transactions on pattern analysis and machine intelligence. IEEE Press, New York, pp 973–986
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in CNNs. In: 2018 IEEE conference on computer vision and pattern recognition. IEEE Press, Salt Lake City, pp 6995–70 03
Zhang Y, Gravina R, Lu H, Villari M, Fortino G (2018) PEA: parallel electrocardiogram-based authentication for smart healthcare systems. In: Journal of network and computer applications. Elsevier, Amsterdam, pp 10–16
Zhao W, Zhao F, Wang D, Lu H (2018) Defocus blur detection via multi-stream bottom-top-bottom fully convolutional network. In: 2018 IEEE conference on computer vision and pattern recognition. IEEE Press, Salt Lake City, pp 3080–3088
Zhongdong W, Saichao W, Zichao H (2013) A Bayesian approach to skin detection in YCbCr color space. In: 2013 international joint conference on awareness science and Technology & Ubi-Media Computing. IEEE Press, Aizu-Wakamatsu, pp 606–610
Zitnick CL, Vedantam R, Parikh D (2014) Adopting abstract images for semantic scene un-derstanding. In: IEEE transactions on pattern analysis and machine intelligence. IEEE Press, New York, pp 627–638
Zuo H, Fan H, Blasch E, Ling H (2017) Combining convolutional and recurrent neural networks for human skin detection. In: IEEE signal processing letters. IEEE Press, Mississippi State, pp 289–293
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61572392) and Shaanxi Provincial Natural Science Foundation (Grant No. 2017JC2-08).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xiao, F., Liu, B. & Li, R. Pedestrian object detection with fusion of visual attention mechanism and semantic computation. Multimed Tools Appl 79, 14593–14607 (2020). https://doi.org/10.1007/s11042-018-7143-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7143-6