Human parsing by weak structural label

Chen, Zhiyong; Liu, Si; Zhai, Yanlong; Lin, Jia; Cao, Xiaochun; Yang, Liang

doi:10.1007/s11042-017-5368-4

Human parsing by weak structural label

Published: 25 November 2017

Volume 77, pages 19795–19809, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhiyong Chen¹,
Si Liu²,
Yanlong Zhai³,
Jia Lin⁴,
Xiaochun Cao² &
…
Liang Yang⁵

485 Accesses
3 Citations
Explore all metrics

Abstract

Human parsing, which decomposes a human centric image into several semantic labels, e.g., face, skin etc, is an active topic in recent years. Traditional human parsing methods are always conducted on a supervised setting, i.e., the pixel-wise labels are available during the training process, which require tedious human labeling efforts. In this paper, we propose a weakly supervised deep parsing method to alleviate the human from the time-consuming labeling. More specifically, we resort to train a robust human parser with the structural image-level labels, e.g., “red jeans” etc. The structural label contains an attribute, e.g., “red”, as well as a class label, e.g., “jeans”. Our framework is based on the Fully Convolution Network (FCN) (Pathak et al. 2014) with two critical differences. First, the loss function defined on the pixel by FCN (Pathak et al. 2014) is modified to the image-level loss by aggregating the pixel-wise prediction of the whole image into a multiple instance learning manner. Besides, we develop a novel logistic pooling layer to constrain that the pixels responding to the color and corresponding category labels are the same to interpret the structural label. Extensive experiments in the publicly available dataset (Liu et al. IEEE Trans Multimedia 16(1):253–265, 2014) show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Edge-Aware Graph Representation Learning and Reasoning for Face Parsing

CNN-EFF: CNN Based Edge Feature Fusion in Semantic Image Labelling and Parsing

Article 18 January 2022

Macro-Micro Adversarial Network for Human Parsing

Notes

http://www.chictopia.com/

References

Chen H, Gallagher A, Girod B (2012) Describing clothing by semantic attributes. In: Computer vision–ECCV 2012, pp 609–623
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Chen Q, Huang J, Feris R, Brown LM, Dong J, Yan S (2015) Deep domain adaptation for describing people based on fine-grained clothing attributes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5315–5324
Chopra S, Hadsell R, LeCun Y (2005) Learning a similarity metric discriminatively, with application to face verification. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 539–546
Deng Y, Luo P, Loy CC, Tang X (2015) Learning to recognize pedestrian attribute. arXiv:1501.00901
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
Hong S, Noh H, Han B (2015) Decoupled deep neural network for semi-supervised semantic segmentation. In: Advances in neural information processing systems, pp 1495–1503
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Liang X, Liu S, Shen X, Yang J, Liu L, Dong J, Lin L, Yan S (2015) Deep human parsing with active template regression. IEEE Trans Pattern Anal Mach Intell 37(12):2402–2414
Article Google Scholar
Liu S, Feng J, Song Z, Zhang T, Lu H, Xu C, Yan S (2012) Hi, magic closet, tell me what to wear!. In: Proceedings of the 20th ACM international conference on multimedia. ACM, pp 619–628
Liu S, Song Z, Liu G, Xu C, Lu H, Yan S (2012) Street-to-shop: cross-scenario clothing retrieval via parts alignment and auxiliary set. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3330–3337
Liu S, Feng J, Domokos C, Xu H, Huang J, Hu Z, Yan S (2014) Fashion parsing with weak color-category labels. IEEE Trans Multimedia 16(1):253–265
Article Google Scholar
Liu S, Liang X, Liu L, Lu K, Lin L, Cao X, Yan S (2015) Fashion parsing with video context. IEEE Trans Multimedia 17(8):1347–1358
Article Google Scholar
Liu S, Liang X, Liu L, Shen X, Yang J, Xu C, Lin L, Cao X, Yan S (2015) Matching-cnn meets knn: quasi-parametric human parsing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1419–1427
Liu S, Wang C, Qian R, Yu H, Bao R (2016) Surveillance video parsing with single frame supervision. arXiv:1611.09587
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Luo P, Wang X, Tang X (2013) Pedestrian parsing via deep decompositional network. In: Proceedings of the IEEE international conference on computer vision, pp 2648–2655
Papandreou G, Chen LC, Murphy KP, Yuille AL (2015) Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1742–1750
Pathak D, Shelhamer E, Long J, Darrell T (2014) Fully convolutional multi-class multiple instance learning. arXiv:1412.7144
Pathak D, Krahenbuhl P, Darrell T (2015) Constrained convolutional neural networks for weakly supervised segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1796–1804
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Van De Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8
Wang C, Yang H, Meinel C (2016) A deep semantic framework for multimodal representation learning. Multimedia Tools Appl 75(15):9255–9276
Article Google Scholar
Wang H, Cai Y, Chen X, Chen L (2016) Occluded vehicle detection with local connected deep model. Multimedia Tools Appl 75(15):9277–9293
Article Google Scholar
Yamaguchi K, Kiapour MH, Ortiz LE, Berg TL (2012) Parsing clothing in fashion photographs. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE, pp 3570–3577
Yamaguchi K, Hadi Kiapour M, Berg TL (2013) Paper doll parsing: retrieving similar styles to parse clothing items. In: Proceedings of the IEEE international conference on computer vision, pp 3519–3526
Yang W, Luo P, Lin L (2014) Clothing co-parsing by joint image segmentation and labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3182–3189
Zhang H, Zha ZJ, Yang Y, Yan S, Gao Y, Chua TS (2013) Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 33–42
Zhang H, Shen F, Liu W, He X, Luan H, Chua TS (2016) Discrete collaborative filtering. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 325–334
Zhang H, Kyaw Z, Chang SF, Chua TS (2017) Visual translation embedding network for visual relation detection. arXiv:1702.08319
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2014) Object detectors emerge in deep scene cnns. arXiv:1412.6856
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No.61572493, No.61503281, Grant U1536203, Grant 61602037).

Author information

Authors and Affiliations

School of Information Science and Engineering, Lanzhou University, No. 222, TianShui south Road, ChengGuan District, Lanzhou, Gansu, China
Zhiyong Chen
State Key Laboratory of Information Security, Institute of Information Engineering, CAS, No. 89, Minzhuang Road, Haidian District, Beijing, China
Si Liu & Xiaochun Cao
School of Computer Science and Technology, Beijing Institute of Technology, No. 5 Zhongguancun South Street, Haidian District, Beijing, China
Yanlong Zhai
JD.com, North Star Century Center 10 floor, Beijng, China
Jia Lin
School of Information Engineering, Tianjin University of Commerce, No. 409 Guangrong Road, Beichen District, Tianjin, China
Liang Yang

Authors

Zhiyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Si Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanlong Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Jia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochun Cao
View author publications
You can also search for this author in PubMed Google Scholar
Liang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanlong Zhai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Liu, S., Zhai, Y. et al. Human parsing by weak structural label. Multimed Tools Appl 77, 19795–19809 (2018). https://doi.org/10.1007/s11042-017-5368-4

Download citation

Received: 02 January 2017
Revised: 01 September 2017
Accepted: 30 October 2017
Published: 25 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11042-017-5368-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human parsing by weak structural label

Abstract

Access this article

Similar content being viewed by others

Edge-Aware Graph Representation Learning and Reasoning for Face Parsing

CNN-EFF: CNN Based Edge Feature Fusion in Semantic Image Labelling and Parsing

Macro-Micro Adversarial Network for Human Parsing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human parsing by weak structural label

Abstract

Access this article

Similar content being viewed by others

Edge-Aware Graph Representation Learning and Reasoning for Face Parsing

CNN-EFF: CNN Based Edge Feature Fusion in Semantic Image Labelling and Parsing

Macro-Micro Adversarial Network for Human Parsing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation