Skip to main content
Log in

Exploiting interaction of fine and coarse features and attribute co-occurrence for person attribute recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Person attribute recognition, i.e., the prediction of a fixed set of semantic attributes given an image of a person, becomes an important topic in the field of computer vision. Recently, methods based on convolutional neural networks have shown outstanding performance in this area. They usually employ a CNN network to mine the shared feature representation followed by several layers for attribute classification. To improve the representation ability of the model, many methods element-add or concatenate coarse and fine feature maps to fuse information at different feature levels. However, these methods didn’t fully exploit the interaction of multi-level convolutional feature maps for person attribute analysis and not consider the correlation of attributes for the same person. In this paper, we introduce a kind of correlation feature, which exploits the high order interaction of coarse and fine feature maps to capture the robust feature representation from multi-level convolution layers as the image representation for person attribute recognition. Moreover, we propose an intraperson attribute loss to explicitly model the correlation of attributes for the same person. We experiment our proposed model on CIFAR-10 dataset, Berkeley Human Attributes dataset, PA-100 K dataset, and experimental results show the better performance of the feature representation and the effectiveness of intra-person attribute loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Here wi equals to wj, and Hi equals to Hj.

  2. Compared with pre-trained model which has 9.3 million parameters and 628 million flops, our proposed model has 9.5 million parameters and 641 million flops.

  3. Convolutional layers with kernel size of 3 × 3 are used to make the feature maps the same size.

  4. All models are pre-trained using ImageNet dataset.

  5. Most methods not mentioned the number of parameters and complexity, and we just compare with Deep-Mar and LG-NET.

References

  1. Arbel P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 447–456

  2. Bourdev L, Maji S, Malik J (2011) Describing people: a poselet-based approach to attribute classification. In: 2011 International Conference on Computer Vision, pp. 1543–1550

  3. Cao L, Dikmen M, Fu Y, Huang TS (2008) Gender recognition from body. Proceeding 16th ACM Int. Conf. Multimed., no. January, pp. 725–728

  4. Chen A (2017) Base pretrained models and datasets in pytorch. [Online]. Available: https://github.com/aaron-xichen/pytorch-playground. Accessed 26 Oct 2019

  5. Chollet F (2016) Xception: deep learning with separable convolutions. arXiv Prepr. arXiv1610.02357, pp. 1–14

  6. Diba A, Mohammad Pazandeh A, Pirsiavash H, Van Gool L (2016) Deepcamp: Deep convolutional action & attribute mid-level patterns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3557–3565

  7. Dong Q, Gong S, Zhu X (2017) Multi-Task curriculum transfer deep learning of clothing attributes. In: IEEE Winter Conference on Applications of Computer Vision, pp. 520–529

  8. Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2470–2478

  9. Guo H, Fan X, Wang S (2017) Human attribute recognition by refining attention heat map. Pattern Recogn Lett 94:38–45

    Article  Google Scholar 

  10. Guo H, Zheng K, Fan X et al. (2019) Visual attention consistency under image transforms for multi-label image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 729–739

  11. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  12. Honari S, Yosinski J, Vincent P, Pal C (2017) Recombinator networks: learning coarse-to-fine feature aggregation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5743–5752

  13. Huang G, Liu Z, Van Der Maaten L, et al. (2017) Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition.: 4700–4708

  14. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto

  15. Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Communications of the ACM 60(6):84–90

  16. Layne R, Hospedales T, Gong S (2012) Person re-identification by Attributes. In: Procedings of the British Machine Vision Conference, p. 8

  17. Lee CY, Gallagher P, Tu Z (2018) Generalizing pooling functions in CNNs: mixed, gated, and tree. IEEE Trans Pattern Anal Mach Intell 40(4):863–875

    Article  Google Scholar 

  18. Li D, Chen X, Huang K (2015) Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: Proceedings - 3rd IAPR Asian Conference on Pattern Recognition, pp. 111–115

  19. Li Y, Huang C, Loy CC, Tang X (2016) Human attribute recognition by deep hierarchical contexts. In: European Conference on Computer Vision, pp. 684–700

  20. Li D, Chen X, Zhang Z, Huang K (2018) Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6

  21. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125

  22. Liu C, Gong S, Loy CC, Lin X (2012) Person re-identification: what features are important?. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7583 LNCS, no. PART 1, pp. 391–401

  23. Liu W et al. (2015) SSD: single shot multibox detector, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37

  24. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738

  25. Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better[J]. arXiv preprint arXiv:1506.04579

  26. Liu X et al. (2017) HydraPlus-Net: attentive deep features for pedestrian analysis. arXiv Prepr. arXiv1709.09930

  27. Liu P, Liu X, Yan J, Shao J (2018) Localization guided learning for pedestrian attribute recognition. arXiv:1808.09102

  28. Luwei Yang YWSL, Zhu L, Tan P (2016) Attribute recognition from adaptive parts. Proceedings of the British Machine Vision Conference, pp. 81.1–81.11

  29. Matsukawa T, Suzuki E (2016) Person re-identification using CNN feat learned from combination of Attributes.pdf. In: 23rd International Conference on Pattern Recognition, pp. 2428–2433

  30. Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical Gaussian descriptor for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1363–1372

  31. McDonnell MD, Vladusich T (2015) Enhanced image classification with a fast-learning shallow convolutional neural network. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2015–Septe

  32. Mishkin D, Matas J (2015) All you need is a good init, pp. 1–13

  33. Qin Y, Yan C, Liu G, Li Z, Jiang C (2020) Pairwise Gaussian loss for convolutional neural networks. IEEE Trans Industr Inform 16(10):6324–6333. https://doi.org/10.1109/TII.2019.2963434

    Article  Google Scholar 

  34. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  35. Schumann A, Stiefelhagen R (2017) Person re-identification by deep learning attribute-complementary information. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28

  36. Sharma G, Jurie F, Schmid C (2013) Expanded parts model for human attribute and action recognition in still images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 652–659

  37. Shi Z, Hospedales TM, Xiang T, Mary Q, London E (2015) Transferring a semantic representation for person re-identification and search. In: Computer Vision and Pattern Recognition, pp. 4184–4193

  38. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  39. Sudowe P, Spitzer H, Leibe B (2015) Person attribute recognition with a jointly-trained holistic CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 329–337

  40. Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9

  41. Wang J, Zhu X, Gong S, Li W (2017) Attribute recognition by joint recurrent learning of context and correlation. In: IEEE International Conference on Computer Vision

  42. Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1249–1258

  43. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995

  44. Zhang N, Paluri M, Ranzato M, Darrell T, Bourdev L (2014) PANDA: Pose aligned networks for deep attribute modeling. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1637–1644

  45. Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 202–211

  46. Zhao X, Sang L, Ding G, Guo Y, Jin X (2018) Grouping attribute recognition for pedestrian with joint recurrent learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3177–3183

  47. Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. Arxiv 14(8):1–20

    Google Scholar 

  48. Zhu J, Liao S, Yi D, Lei Z, Li SZ (2015) Multi-label cnn based pedestrian attribute learning for soft biometrics. International Conference on Biometrics, pp. 535–540

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junyong Ye.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Z., Ye, J., Wang, T. et al. Exploiting interaction of fine and coarse features and attribute co-occurrence for person attribute recognition. Multimed Tools Appl 80, 11887–11902 (2021). https://doi.org/10.1007/s11042-020-10108-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10108-z

Keywords

Navigation