research-article

Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval

Authors:

Junhong ChenAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 2073 - 2077

https://doi.org/10.1145/3240508.3266436

Published: 15 October 2018 Publication History

Abstract

Cross-domain beauty and personal care product image retrieval is a challenging problem due to data variations (e.g., brightness, viewpoint, and scale), and the rich types of items. In this paper, we present a regional maximum activations of convolutions with attention (RA-MAC) descriptor to extract image features for retrieval. RA-MAC improves the regional maximum activations of convolutions (R-MAC) descriptor considering the influence of background in cross-domain images (i.e., shopper domain and seller domain). More specifically, RA-MAC utilizes the characteristics of the convolutional layer to find the attention of an image, and reduces the influence of the unimportant regions in an unsupervised manner. Furthermore, a few strategies have been exploited to improve the performance, such as multiple features fusion, query expansion, and database augmentation. Extensive experiments conducted on a dataset consisting of half a million images of beauty care products (Perfect-500K) manifest the effectiveness of RA-MAC. Our approach achieves the 2nd place in the leader board of the Grand Challenge of AI Meets Beauty in ACM Multimedia 2018. Our code is available at: https://github.com/RetrainIt/Perfect-Half-Million-Beauty-Product-Image-Recognition-Challenge.

References

[1]

A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. 2003. Context-based vision system for place and object recognition. In ICCV, 2003, vol. 3, 273--280.

Digital Library

[2]

D. G. 2004. Lowe. Distinctive image features from scale-invariant keypoints. In Int. J. Comput. Vis., vol. 60, no. 2, 91--110.

Digital Library

[3]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2014. Learning and transferring midlevel image representations using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1717--1724.

Digital Library

[4]

A. Babenko and V. Lempitsky. 2015. Aggregating Deep Convolutional Features for Image Retrieval. In Comput. Sci.

[5]

G. Tolias, R. Sicre, and H. Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. In Comput. Sci.

[6]

A. Gordo, J. Almazán, J. Revaud, and D. Larlus. 2016. End-to-End Learning of Deep Visual Representations for Image Retrieval. In Int. J. Comput. Vis. 1--18.

Digital Library

[7]

F. Radenović, G. Tolias, and O. Chum. 2018. Fine-tuning CNN Image Retrieval with No Human Annotation. In IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]

Y. Wang, S. Li, and A. C. Kot, 2016. On Branded Handbag Recognition. In IEEE Trans. Multimed., vol. 18, no. 9. 1869--1881.

Digital Library

[9]

S. Liu, Z. Song, G. Liu, and C. B. T.-C. V. and P. R. Xu. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 3330-3337.

Digital Library

[10]

Z. Liu, P. Luo, S. Qiu, X. Wang, and X. B. 2016. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 1096--1104.

[11]

Q. Chen, J. Huang, R. Feris, L. M. Brown, J. Dong, and S. B. 2015. Deep domain adaptation for describing people based on fine-grained clothing attributes. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on. IEEE, 2015, 5315--5324.

[12]

H. Jégou and O. Chum. 2012. Negative evidences and co-occurences in image retrieval: The benefit of PCA and whitening. In Computer Vision--ECCV 2012, Springer, 774--787.

[13]

O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV 2007. 1--8.

[14]

R. Arandjelovic and A. Zisserman. 2012. Three things everyone should know to improve object retrieval. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2911--2918.

Digital Library

[15]

S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. In J. ACM, vol. 45, no. 6, 891--923.

Digital Library

[16]

X.-S. Wei, J.-H. Luo, J. Wu, and Z.-H. Zhou. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. In IEEE Trans. Image Process., vol. 26, no. 6. 2868--2881.

Digital Library

[17]

W.-H. Cheng, J. Jia, S. Liu, J. Fu, J. Tseng, and J. Huang. 2018. Perfect Corp. Challenge 2018: Half Million Beauty Product Image Recognition. In https://challenge2018.perfectcorp.com/index.html.

[18]

K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Computer Science.

[19]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 770--778.

[20]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference, 5987--5995.

[21]

J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. In Computer Vision and Pattern Recognition (CVPR), 2018 IEEE Conference.

[22]

J. Deng, W. Dong, R. Socher, L.J. Li, K., Li, and F.F., Li. 2009. ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition (CVPR), 2009 IEEE Conference, 248--255.

Cited By

Dong XZhan XWei YWei XWang YLu MCao XLiang X(2023)Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3291237(1-16)Online publication date: 2023
https://doi.org/10.1109/TPAMI.2023.3291237
Xu MWang YXu BZhang JRen JHuang ZPoslad SXu P(2023)A critical analysis of image-based camera pose estimation techniquesNeurocomputing10.1016/j.neucom.2023.127125(127125)Online publication date: Dec-2023
https://doi.org/10.1016/j.neucom.2023.127125
Singh ADixit AKumar Singh B(2022)Hypertuned Convolutional Neural Network Residual Model Based Content Based Image Retrival System2022 International Conference on Fourth Industrial Revolution Based Technology and Practices (ICFIRTP)10.1109/ICFIRTP56122.2022.10059435(139-144)Online publication date: 23-Nov-2022
https://doi.org/10.1109/ICFIRTP56122.2022.10059435
Show More Cited By

Index Terms

Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Cross-domain Beauty Item Retrieval via Unsupervised Embedding Learning
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Cross-domain image retrieval is always encountering insufficient labelled data in real world. In this paper, we propose unsupervised embedding learning (UEL) for cross-domain beauty and personal care product retrieval to finetune the convolutional ...
Multi-Scale Generalized Attention-Based Regional Maximum Activation of Convolutions for Beauty Product Retrieval
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

The application of beauty and personal-care product retrieval seems to be evident in our daily life, and it has attracted increasing research interests during the last decade. However, the retrieval task is suffered from different image variations and ...
Cross-Domain Image Retrieval with Attention Modeling
MM '17: Proceedings of the 25th ACM international conference on Multimedia

With the proliferation of e-commerce websites and the ubiquitousness of smart phones, cross-domain image retrieval using images taken by smart phones as queries to search products on e-commerce websites is emerging as a popular application. One challenge ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Guangdong Innovative Research Team Program

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
450
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dong XZhan XWei YWei XWang YLu MCao XLiang X(2023)Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.3291237(1-16)Online publication date: 2023
https://doi.org/10.1109/TPAMI.2023.3291237
Xu MWang YXu BZhang JRen JHuang ZPoslad SXu P(2023)A critical analysis of image-based camera pose estimation techniquesNeurocomputing10.1016/j.neucom.2023.127125(127125)Online publication date: Dec-2023
https://doi.org/10.1016/j.neucom.2023.127125
Singh ADixit AKumar Singh B(2022)Hypertuned Convolutional Neural Network Residual Model Based Content Based Image Retrival System2022 International Conference on Fourth Industrial Revolution Based Technology and Practices (ICFIRTP)10.1109/ICFIRTP56122.2022.10059435(139-144)Online publication date: 23-Nov-2022
https://doi.org/10.1109/ICFIRTP56122.2022.10059435
Li SGuo YRen HWang ZRen KLiu CLin HShi J(2022)FCNet: A feature context network based on ensemble framework for image retrievalIET Computer Vision10.1049/cvi2.1208816:4(295-306)Online publication date: 24-Jan-2022
https://doi.org/10.1049/cvi2.12088
Ning CDi YMenglu L(2022)Survey on clothing image retrieval with cross-domainComplex & Intelligent Systems10.1007/s40747-022-00750-58:6(5531-5544)Online publication date: 13-May-2022
https://doi.org/10.1007/s40747-022-00750-5
Cheng QGan DFu PHuang HZhou Y(2021)A Novel Ensemble Architecture of Residual Attention-Based Deep Metric Learning for Remote Sensing Image RetrievalRemote Sensing10.3390/rs1317344513:17(3445)Online publication date: 30-Aug-2021
https://doi.org/10.3390/rs13173445
Yang WHua Y(2021)Cross-modal Retrieval based on Big Transfer and Regional Maximum Activation of Convolutions with Generalized AttentionProceedings of the 2021 2nd International Conference on Control, Robotics and Intelligent System10.1145/3483845.3483872(153-157)Online publication date: 20-Aug-2021
https://dl.acm.org/doi/10.1145/3483845.3483872
Wang XMa LFu YXue XCheng WKankanhalli MWang MChu WLiu JWorring M(2021)Neural Symbolic Representation Learning for Image CaptioningProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463637(312-321)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3460426.3463637
Li XXu GChen ZHuang B(2021)A Street View Image Retrieval Method Based on Fusion of Multiple Features2021 International Symposium on Computer Technology and Information Science (ISCTIS)10.1109/ISCTIS51085.2021.00081(364-370)Online publication date: Jun-2021
https://doi.org/10.1109/ISCTIS51085.2021.00081
Zhou YFan HGao SYang YZhang XLi JGuo Y(2021)Retrieval and Localization with Observation Constraints2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9560987(5237-5244)Online publication date: 30-May-2021
https://doi.org/10.1109/ICRA48506.2021.9560987
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten