Skip to main content
Log in

Depthwise Separable Convolutional Neural Networks for Pedestrian Attribute Recognition

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Video surveillance is ubiquitous. In addition to understanding various scene objects, extracting human visual attributes from the scene has attracted tremendous traction over the past many years. This is a challenging problem even for human observers. This is a multi-label problem, i.e., a subject in a scene can have multiple attributes that we are hoping to recognize, such as shoes types, clothing type, wearing some accessory, or carrying some object or not, etc. Solutions have been presented over the years and many researchers have employed convolutional neural networks (CNNs). In this work, we propose using Depthwise Separable Convolution Neural Network (DS-CNN) to solve the pedestrian attribute recognition problem. The network employs depthwise separable convolution layers (DSCL), instead of the regular 2D convolution layers. DS-CNN performs extremely well, especially with smaller datasets. In addition, with a compact network, DS-CNN reduces the number of trainable parameters while making learning efficient. We evaluated our method on two benchmark pedestrian datasets and results show improvements over the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Raudies F, Neumann H. Abio-inspired, motion-basedanalysisofcrowdbehavior attributes relevancetomotiontransparency,velocitygradients,andmotionpatterns. PLoS ONE. 2013;7(12):1–17.

    Google Scholar 

  2. Rahman K, Ghani NA, Kamil AA, Mustafa A, Chowdhury MAK. Modellingpedestriantravel timeandthedesignoffacilities: a queuingapproach. PLoS ONE. 2013;8:1–11.

    Article  Google Scholar 

  3. Nanda A, Chauhan DS, Sa PK, Bakshi S. Illuminationand scaleinvariantrelevantvisualfeatureswith hypergraph-basedlearningformulti-shotperson re-identification. Multimed Tools Appl. 2019;78(4):3885–910.

    Article  Google Scholar 

  4. Deng Y, Luo P, Loy CC, Tang X. Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on multimedia, MM’14; 2014, 789–792.

  5. Li D, Zhang Z, Chen X, Ling H, Huang K. A richly annotated dataset for pedestrian attribute recognition. CoRR, vol. abs/1603.07054, 2016.

  6. Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision; 1999, vol. 2, pp. 1150–1157

  7. Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005, vol. 1, pp. 886–893.

  8. Viola P, Jones M. Robust real-time object detection. In: International journal of computer vision (IJCV); 2001, vol. 57.

  9. Chollet F. Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR); 2017, pp. 1800–1807.

  10. Hu Z, Youmin H, Liu J, Wu B, Han D, Kurfess T. 3dseparable convolutional neuralnetworkfordynamichandgesturerecognition. Neurocomputing. 2018;318:151–61.

    Article  Google Scholar 

  11. Gonda F, Wei D, Parag T, Pfister H. Parallel separable 3d convolution for video and volumetric data understanding. In: BMVC; 2018.

  12. Hussein N, Gavves E, Smeulders AWM. Timeception for complex action recognition. In: IEEE conference on computer vision and pattern recognition, CVPR; 2019, 2019, pp. 254–263.

  13. Junejo IN. A deep learning based multi-color space approach for pedestrian attribute recognition. In: Proceedings of the 2019 3rd international conference on graphics and signal processing; 2019, ICGSP’19, pp. 113–116, ACM.

  14. Yang R, Luo B, Tang J, Wang X, Zheng S. Pedestrian attribute recognition: a survey. arXiv: 1901.07474 [preprint]. 2019.

  15. Maji S, Berg AC, Malik J. Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE conference on computer vision and pattern recognition; 2008, pp. 1–8

  16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia; 2014, MM’14.

  17. Joo J, Wang S, Zhu S. Human attribute recognition by rich appearance dictionary. In: 2013 IEEE international conference on computer vision; 2013, pp. 721–8.

  18. Bourdev L, Maji S, Malik J. Describing people: a poselet-based approach to attribute classification. In: 2011 international conference on computer vision, 2011, pp. 1543–50.

  19. Zhao X, Sang L, Ding G, Han J, Di Na, Yan C. Recurrent attention model for pedestrian attribute recognition. Proc AAAI Conf Artif Intell. 2019;33(01):9275–82.

    Google Scholar 

  20. Zhu J, Liao S, Yi D, Lei Z, Li SZ. Multi-label CNN based pedestrian attribute learning for soft biometrics. In: 2015 international conference on biometrics (ICB); 2015, pp. 535–40.

  21. Zhou Y, Yu K, Leng B, Zhang Z, Li D, Huang K. Weakly-supervised learning of mid-level features for pedestrian attribute recognition and localization In: British machine vision conference BMVC 4–7; 2017.

  22. Chen Y, Duffner S, Stoian A, Dufour J-Y, Baskurt A. Pedestrian attribute recognition with part-based CNN and combined feature representations. In: Proceedings of the 13th international joint conference on computer vision, imaging and computer graphics theory and applications; 2018, pp. 114–22.

  23. Liao S, Hu Y, Zhu X, Li SZ. Person re-identification by local maximal occurrence representation and metric learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR); 2015, pp. 2197–206.

  24. Li D, Chen X, Zhang Z, Huang K. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE international conference on multimedia and expo (ICME); 2018, pp. 1–6.

  25. Liu P, Liu X, Yan J, Shao J. Localization guided learning for pedestrian attribute recognition. In: British machine vision conference 2018, BMVC 2018; 2018.

  26. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on international conference on machine learning—vol. 37; 2015, ICML’15, pp. 448–56.

  27. Li Q, Zhao X, He R, Huang K. Visual-semantic graph reasoning for pedestrian attribute recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, No. 01; 2019.

  28. Sarfraz M, Schumann A, Wang Y, Stiefelhagen R. Deep view-sensitive pedestrian attribute inference in an end-to-end model. In: British machine vision conference (BMVC); 2017.

  29. Sarfraz MS, Schumann A, Eberle A, Stiefelhagen R. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

  30. An H, Fan H, Deng K, Hu H-M. Part-guided network for pedestrian attribute recognition. In: 2019 IEEE visual communications and image processing (VCIP), pp. 1–4, 2019.

  31. Liu X, Zhao H, Tian M, Sheng L, Shao J, Yan J, Wang X. Hydraplus-net: attentive deep features for pedestrian analysis. In: Proceedings of the IEEE international conference on computer vision; 2017, pp. 1–9.

  32. Sarafianos N, Xu X, Kakadiaris IA. Deep imbalanced attribute classification using visual attention aggregation. In: Springer European conference on computer vision; 2018, pp. 708–25.

  33. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning deep features for scene recognition using places database. In: Proceedings of the 27th international conference on neural information processing systems—vol. 1, MIT Press, Cambridge, MA, USA, 2014, NIPS’14, pp. 487–95.

  34. Guo H, Fan X, Wang S. Human attributerecognitionbyrefiningattention heatmap. Pattern Recognit Lett. 2017;94(C):38–45.

    Article  Google Scholar 

  35. Li W, Zhu X, Gong S. Harmonious attention network for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

  36. Chang X, Hospedales TM, Xiang T. Multi-level factorisation net for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

  37. Wang J, Zhu X, Gong S, Li W. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

  38. Si J, Zhang H, Li C-G, Kuen J, Kong X, Kot AC, Wang G. Dual attention matching network for context-aware feature sequence based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

  39. Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang Y-G, Xue X. Pose-normalized image generation for person re-identification. In: The European conference on computer vision (ECCV); 2018.

  40. Chikontwe P, Lee HJ. Deep multi-task network for learning person identity and attributes. IEEE Access. 2018;6:60801–11.

    Article  Google Scholar 

  41. Bekele E, Lawson W. The deeper, the better: analysis of person attributes recognition. In: 14th IEEE international conference on automatic face & gesture recognition, FG; 2019.

  42. Li RHQ, Zhao X, Huang K. Visual-semantic graph reasoning for pedestrian attribute recognition. In: 33rd AAAI Conference on Artificial Intelligence, AAAI; 2019.

  43. Zhao X, Sang L, Ding G, Han J, Di N, Yan C. Recurrent attention model for pedestrian attribute recognition. In: 33rd AAAI conference on artificial intelligence, AAAI; 2019.

  44. Sudowe P, Spitzer H, Leibe B. Person attribute recognition with a jointly-trained holistic CNN model. In: 2015 IEEE international conference on computer vision workshop (ICCVW); 2015, pp. 329–337

  45. Chollet F. keras; 2015.

  46. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: CoRR, vol. abs/1409.1556; 2014.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Imran N. Junejo.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Junejo, I.N., Ahmed, N. Depthwise Separable Convolutional Neural Networks for Pedestrian Attribute Recognition. SN COMPUT. SCI. 2, 100 (2021). https://doi.org/10.1007/s42979-021-00493-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00493-z

Keywords

Navigation