Abstract
Person attribute recognition, i.e., the prediction of a fixed set of semantic attributes given an image of a person, becomes an important topic in the field of computer vision. Recently, methods based on convolutional neural networks have shown outstanding performance in this area. They usually employ a CNN network to mine the shared feature representation followed by several layers for attribute classification. To improve the representation ability of the model, many methods element-add or concatenate coarse and fine feature maps to fuse information at different feature levels. However, these methods didn’t fully exploit the interaction of multi-level convolutional feature maps for person attribute analysis and not consider the correlation of attributes for the same person. In this paper, we introduce a kind of correlation feature, which exploits the high order interaction of coarse and fine feature maps to capture the robust feature representation from multi-level convolution layers as the image representation for person attribute recognition. Moreover, we propose an intraperson attribute loss to explicitly model the correlation of attributes for the same person. We experiment our proposed model on CIFAR-10 dataset, Berkeley Human Attributes dataset, PA-100 K dataset, and experimental results show the better performance of the feature representation and the effectiveness of intra-person attribute loss.
Similar content being viewed by others
Notes
Here wi equals to wj, and Hi equals to Hj.
Compared with pre-trained model which has 9.3 million parameters and 628 million flops, our proposed model has 9.5 million parameters and 641 million flops.
Convolutional layers with kernel size of 3 × 3 are used to make the feature maps the same size.
All models are pre-trained using ImageNet dataset.
Most methods not mentioned the number of parameters and complexity, and we just compare with Deep-Mar and LG-NET.
References
Arbel P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 447–456
Bourdev L, Maji S, Malik J (2011) Describing people: a poselet-based approach to attribute classification. In: 2011 International Conference on Computer Vision, pp. 1543–1550
Cao L, Dikmen M, Fu Y, Huang TS (2008) Gender recognition from body. Proceeding 16th ACM Int. Conf. Multimed., no. January, pp. 725–728
Chen A (2017) Base pretrained models and datasets in pytorch. [Online]. Available: https://github.com/aaron-xichen/pytorch-playground. Accessed 26 Oct 2019
Chollet F (2016) Xception: deep learning with separable convolutions. arXiv Prepr. arXiv1610.02357, pp. 1–14
Diba A, Mohammad Pazandeh A, Pirsiavash H, Van Gool L (2016) Deepcamp: Deep convolutional action & attribute mid-level patterns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3557–3565
Dong Q, Gong S, Zhu X (2017) Multi-Task curriculum transfer deep learning of clothing attributes. In: IEEE Winter Conference on Applications of Computer Vision, pp. 520–529
Gkioxari G, Girshick R, Malik J (2015) Actions and attributes from wholes and parts. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2470–2478
Guo H, Fan X, Wang S (2017) Human attribute recognition by refining attention heat map. Pattern Recogn Lett 94:38–45
Guo H, Zheng K, Fan X et al. (2019) Visual attention consistency under image transforms for multi-label image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 729–739
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Honari S, Yosinski J, Vincent P, Pal C (2017) Recombinator networks: learning coarse-to-fine feature aggregation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5743–5752
Huang G, Liu Z, Van Der Maaten L, et al. (2017) Densely connected convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition.: 4700–4708
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto
Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Communications of the ACM 60(6):84–90
Layne R, Hospedales T, Gong S (2012) Person re-identification by Attributes. In: Procedings of the British Machine Vision Conference, p. 8
Lee CY, Gallagher P, Tu Z (2018) Generalizing pooling functions in CNNs: mixed, gated, and tree. IEEE Trans Pattern Anal Mach Intell 40(4):863–875
Li D, Chen X, Huang K (2015) Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In: Proceedings - 3rd IAPR Asian Conference on Pattern Recognition, pp. 111–115
Li Y, Huang C, Loy CC, Tang X (2016) Human attribute recognition by deep hierarchical contexts. In: European Conference on Computer Vision, pp. 684–700
Li D, Chen X, Zhang Z, Huang K (2018) Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125
Liu C, Gong S, Loy CC, Lin X (2012) Person re-identification: what features are important?. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7583 LNCS, no. PART 1, pp. 391–401
Liu W et al. (2015) SSD: single shot multibox detector, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738
Liu W, Rabinovich A, Berg AC (2015) Parsenet: Looking wider to see better[J]. arXiv preprint arXiv:1506.04579
Liu X et al. (2017) HydraPlus-Net: attentive deep features for pedestrian analysis. arXiv Prepr. arXiv1709.09930
Liu P, Liu X, Yan J, Shao J (2018) Localization guided learning for pedestrian attribute recognition. arXiv:1808.09102
Luwei Yang YWSL, Zhu L, Tan P (2016) Attribute recognition from adaptive parts. Proceedings of the British Machine Vision Conference, pp. 81.1–81.11
Matsukawa T, Suzuki E (2016) Person re-identification using CNN feat learned from combination of Attributes.pdf. In: 23rd International Conference on Pattern Recognition, pp. 2428–2433
Matsukawa T, Okabe T, Suzuki E, Sato Y (2016) Hierarchical Gaussian descriptor for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1363–1372
McDonnell MD, Vladusich T (2015) Enhanced image classification with a fast-learning shallow convolutional neural network. In: Proceedings of the International Joint Conference on Neural Networks, vol. 2015–Septe
Mishkin D, Matas J (2015) All you need is a good init, pp. 1–13
Qin Y, Yan C, Liu G, Li Z, Jiang C (2020) Pairwise Gaussian loss for convolutional neural networks. IEEE Trans Industr Inform 16(10):6324–6333. https://doi.org/10.1109/TII.2019.2963434
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Schumann A, Stiefelhagen R (2017) Person re-identification by deep learning attribute-complementary information. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28
Sharma G, Jurie F, Schmid C (2013) Expanded parts model for human attribute and action recognition in still images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 652–659
Shi Z, Hospedales TM, Xiang T, Mary Q, London E (2015) Transferring a semantic representation for person re-identification and search. In: Computer Vision and Pattern Recognition, pp. 4184–4193
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sudowe P, Spitzer H, Leibe B (2015) Person attribute recognition with a jointly-trained holistic CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 329–337
Szegedy C et al. (2015) Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–9
Wang J, Zhu X, Gong S, Li W (2017) Attribute recognition by joint recurrent learning of context and correlation. In: IEEE International Conference on Computer Vision
Xiao T, Li H, Ouyang W, Wang X (2016) Learning deep feature representations with domain guided dropout for person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1249–1258
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995
Zhang N, Paluri M, Ranzato M, Darrell T, Bourdev L (2014) PANDA: Pose aligned networks for deep attribute modeling. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1637–1644
Zhang P, Wang D, Lu H, Wang H, Ruan X (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 202–211
Zhao X, Sang L, Ding G, Guo Y, Jin X (2018) Grouping attribute recognition for pedestrian with joint recurrent learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3177–3183
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future. Arxiv 14(8):1–20
Zhu J, Liao S, Yi D, Lei Z, Li SZ (2015) Multi-label cnn based pedestrian attribute learning for soft biometrics. International Conference on Biometrics, pp. 535–540
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, Z., Ye, J., Wang, T. et al. Exploiting interaction of fine and coarse features and attribute co-occurrence for person attribute recognition. Multimed Tools Appl 80, 11887–11902 (2021). https://doi.org/10.1007/s11042-020-10108-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10108-z