Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

Zhang, Haijun; Ji, Yuzhu; Huang, Wang; Liu, Linlin

doi:10.1007/s00521-018-3579-x

Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

Original Article
Published: 07 June 2018

Volume 31, pages 7361–7380, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Haijun Zhang¹,
Yuzhu Ji¹,
Wang Huang¹ &
…
Linlin Liu¹

2247 Accesses
106 Citations
Explore all metrics

Abstract

This paper presents a novel learning-based framework for video content-based advertising, DeepLink, which aims at linking Sitcom-stars and online shops with clothing retrieval by using state-of-the-art deep convolutional neural networks (CNNs). Specifically, several deep CNN models are adopted for composing multiple sub-modules in DeepLink, including human-body detection, human pose selection, face verification, clothing detection and retrieval from advertisements (ads) pool that is constructed by clothing images crawled from real-world online shops. For clothing detection and retrieval from ad-images, we firstly transfer the state-of-the-art deep CNN models to our data domain, and then train corresponding models based on our constructed large-scale clothes datasets. Extensive experimental results demonstrate the feasibility and efficacy of our proposed clothing-based video advertising system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sitcom-Stars Oriented Video Advertising via Clothing Retrieval

Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features

Clothing Classification Using Shallow Convolutional Neural Networks

References

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE, pp 3686–3693
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Erin Liong V, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2475–2483
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Hadi Kiapour M, Han X, Lazebnik S, Berg AC, Berg TL (2015) Where to buy it: matching street clothing photos in online shops. In: Proceedings of the IEEE international conference on computer vision, pp 3343–3351
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, pp 346–361
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet Google Scholar
Hou S, Zhou S, Chen L, Feng Y, Awudu K (2016) Multi-label learning with label relevance in advertising video. Neurocomputing 171:932–948
Article Google Scholar
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst
Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE international conference on computer vision, pp 1062–1070
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Larsson G, Maire M, Shakhnarovich G (2016) Fractalnet: ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Y, Wan KW, Yan X, Xu C (2005) Real time advertisement insertion in baseball video based on advertisement effect. In: Proceedings of the 13th annual ACM international conference on Multimedia. ACM, pp 343–346
Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2015) Ssd: single shot multibox detector. arXiv preprint arXiv:1512.02325
Liu X, Kan M, Wu W, Shan S, Chen X (2017) Viplfacenet: an open source deep face recognition sdk. Front Comput Sci. https://doi.org/10.1007/s11704-016-6076-3
Article Google Scholar
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
López-Nores M, Blanco-Fernández Y, Pazos-Arias JJ (2013) Cloud-based personalization of new advertising and e-commerce models for video consumption. Comput J 56(5):573–592
Article Google Scholar
Mei T, Hua XS, Li S (2009) Videosense: a contextual in-video advertising system. IEEE Trans Circuits Syst Video Technol 19(12):1866–1879
Article Google Scholar
Murala S, Maheshwari RP, Balasubramanian R (2012) Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans Image Process 21(5):2874–2886
Article MathSciNet Google Scholar
Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: unified, real-time object detection. arXiv preprint arXiv:1506.02640
Redondo RPD, Vilas AF, Arias JJP, Cabrer MR, Solla AG, Duque JG (2012) Bringing content awareness to web-based idtv advertising. IEEE Trans Syst Man Cybern C (Appl Rev) 42(3):324–333
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996
Sun Y, Liang D, Wang X, Tang X (2015) Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3476–3483
Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1891–1898
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Reed S, Erhan D, Anguelov D (2014) Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567
Tan XY, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650
Article MathSciNet Google Scholar
Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660
Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171
Article Google Scholar
Wang J, Wang B, Duan LY, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimed Tools Appl 70(2):799–820
Article Google Scholar
Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 529–534
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol 1, p 2
Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. 2(3):5. arXiv preprint arXiv:1502.03044 (2015)
Yadati K, Katti H, Kankanhalli M (2014) Cavva: computational affective video-in-video advertising. IEEE Trans Multimed 16(1):15–23
Article Google Scholar
Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. arXiv preprint arXiv:1411.7923
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant no. 2018YFB1003800, the Natural Science Foundation of China under Grant no. 61572156, and the Shenzhen Science and Technology Program under Grant no. JCYJ20170413105929681 and JCYJ20170811161545863.

Author information

Authors and Affiliations

Shenzhen Graduate School, Harbin Institute of Technology, Xili University Town, Shenzhen, 518055, People’s Republic of China
Haijun Zhang, Yuzhu Ji, Wang Huang & Linlin Liu

Authors

Haijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhu Ji
View author publications
You can also search for this author in PubMed Google Scholar
Wang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhu Ji.

Ethics declarations

Conflict of interest

The authors declared that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Ji, Y., Huang, W. et al. Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput & Applic 31, 7361–7380 (2019). https://doi.org/10.1007/s00521-018-3579-x

Download citation

Received: 13 February 2018
Accepted: 30 May 2018
Published: 07 June 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00521-018-3579-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

Abstract

Access this article

Similar content being viewed by others

Sitcom-Stars Oriented Video Advertising via Clothing Retrieval

Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features

Clothing Classification Using Shallow Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

Abstract

Access this article

Similar content being viewed by others

Sitcom-Stars Oriented Video Advertising via Clothing Retrieval

Fast Cross-Scenario Clothing Retrieval Based on Indexing Deep Features

Clothing Classification Using Shallow Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation