research-article

Cross-domain Beauty Item Retrieval via Unsupervised Embedding Learning

Authors:

Qing LiAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 2543 - 2547

https://doi.org/10.1145/3343031.3356055

Published: 15 October 2019 Publication History

Abstract

Cross-domain image retrieval is always encountering insufficient labelled data in real world. In this paper, we propose unsupervised embedding learning (UEL) for cross-domain beauty and personal care product retrieval to finetune the convolutional neural network (CNN). More specifically, UEL utilizes the non-parametric softmax to train the CNN model as instance-level classification, which reduces the influence of some inevitable problems (e.g., shape variations). In order to obtain better performance, we integrate a few existing retrieval methods trained on different datasets. Furthermore, a query expansion strategy (i.e., diffusion) is adopted to improve the performance. Extensive experiments conducted on a dataset including half million images of beauty and personal product items (Perfect-500K) manifest the effectiveness of our proposed method. Our approach achieves the 2nd place in the leader board of the Grand Challenge of AI Meets Beauty in ACM Multimedia 2019. Our code is available at: https://github.com/RetrainIt/Perfect-Half-Million-Beauty-Product-Image-Recognition-Challenge-2019.

References

[1]

Si Liu Jianlong Fu Jiaying Liu Shintami Chusnul Hidayati Johnny Tseng Wen-Huang Cheng, Jia Jia and Jau Huang. 2019. Perfect Corp. Challenge 2019: Half Million Beauty Product Image Recognition. https://challenge2019.perfectcorp.com/.

[2]

Jian Han Lim, Nurul Japar, Chun Chet Ng, and Chee Seng Chan. 2018. Unprece-dented Usage of Pretrained CNNs on Beauty Product. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2068--2072.

[3]

Qi Wang, Jingxiang Lai, Kai Xu, Wenyin Liu, and Liang Lei. 2018. Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2063--2067.

[4]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[5]

Jia Deng, Wei Dong, Richard Socher, Li-JiaLi, Kai Li, and Li Fei-Fei.2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[6]

AliSRazavian, Josephine Sullivan, Stefan Carlsson, and Atsuto Maki. 2016. Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications 4, 3 (2016), 251--258.

[7]

Artem Babenko and Victor Lempitsky. 2015. Aggregating deep convolutional eatures for image retrieval. arXiv preprint arXiv:1510.07493 (2015).

[8]

Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European conference on computer vision. Springer, 685--701.

[9]

Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).

[10]

Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision 124, 2 (2017), 237--254.

Digital Library

[11]

Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018).

[12]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.

[13]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Zehang Lin, Zhenguo Yang, Feitao Huang, and Junhong Chen. 2018. Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2073--2077.

[16]

Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. 2019. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6210--6219.

[17]

Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, and Ondrej Chum. 2017. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2077--2086.

[18]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[19]

Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.

[20]

Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, and Shin'ichi Satoh. 2019. Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing. In the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).

[21]

Karen Simonyan and Andrew Zisserman.2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

Cited By

Xu KLiu YFeng MZhao JWang HCai HWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Multi-Scale Generalized Attention-Based Regional Maximum Activation of Convolutions for Beauty Product RetrievalProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416293(4733-4737)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416293
Vu TDang AWang JWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Learning to Remember Beauty ProductsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416281(4728-4732)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416281
Hou JJi SWang AWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Attention-driven Unsupervised Image Retrieval for Beauty Products with Visual and Textual CluesProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416271(4718-4722)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416271

Index Terms

Cross-domain Beauty Item Retrieval via Unsupervised Embedding Learning
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Cross-domain beauty and personal care product image retrieval is a challenging problem due to data variations (e.g., brightness, viewpoint, and scale), and the rich types of items. In this paper, we present a regional maximum activations of convolutions ...
Enhancing Sparse Retrieval via Unsupervised Learning
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Recent work has shown that neural retrieval models excel at text ranking tasks in a supervised setting when given large amounts of manually labeled training data. However, it remains an open question how to train unsupervised retrieval models that are ...
Unsupervised Cross-Domain Image Retrieval with Semantic-Attended Mixture-of-Experts
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Unsupervised cross-domain image retrieval is designed to facilitate the retrieval between images in different domains in an unsupervised way. Without the guidance of labels, both intra-domain semantic learning and inter-domain semantic alignment pose ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
309
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu KLiu YFeng MZhao JWang HCai HWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Multi-Scale Generalized Attention-Based Regional Maximum Activation of Convolutions for Beauty Product RetrievalProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416293(4733-4737)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416293
Vu TDang AWang JWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Learning to Remember Beauty ProductsProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416281(4728-4732)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416281
Hou JJi SWang AWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Attention-driven Unsupervised Image Retrieval for Beauty Products with Visual and Textual CluesProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3416271(4718-4722)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3416271

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents