ABSTRACT
Cross-domain image retrieval is always encountering insufficient labelled data in real world. In this paper, we propose unsupervised embedding learning (UEL) for cross-domain beauty and personal care product retrieval to finetune the convolutional neural network (CNN). More specifically, UEL utilizes the non-parametric softmax to train the CNN model as instance-level classification, which reduces the influence of some inevitable problems (e.g., shape variations). In order to obtain better performance, we integrate a few existing retrieval methods trained on different datasets. Furthermore, a query expansion strategy (i.e., diffusion) is adopted to improve the performance. Extensive experiments conducted on a dataset including half million images of beauty and personal product items (Perfect-500K) manifest the effectiveness of our proposed method. Our approach achieves the 2nd place in the leader board of the Grand Challenge of AI Meets Beauty in ACM Multimedia 2019. Our code is available at: https://github.com/RetrainIt/Perfect-Half-Million-Beauty-Product-Image-Recognition-Challenge-2019.
- Si Liu Jianlong Fu Jiaying Liu Shintami Chusnul Hidayati Johnny Tseng Wen-Huang Cheng, Jia Jia and Jau Huang. 2019. Perfect Corp. Challenge 2019: Half Million Beauty Product Image Recognition. https://challenge2019.perfectcorp.com/.Google Scholar
- Jian Han Lim, Nurul Japar, Chun Chet Ng, and Chee Seng Chan. 2018. Unprece-dented Usage of Pretrained CNNs on Beauty Product. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2068--2072.Google Scholar
- Qi Wang, Jingxiang Lai, Kai Xu, Wenyin Liu, and Liang Lei. 2018. Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2063--2067.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-JiaLi, Kai Li, and Li Fei-Fei.2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google Scholar
- AliSRazavian, Josephine Sullivan, Stefan Carlsson, and Atsuto Maki. 2016. Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications 4, 3 (2016), 251--258.Google ScholarCross Ref
- Artem Babenko and Victor Lempitsky. 2015. Aggregating deep convolutional eatures for image retrieval. arXiv preprint arXiv:1510.07493 (2015).Google Scholar
- Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European conference on computer vision. Springer, 685--701.Google ScholarCross Ref
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).Google Scholar
- Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision 124, 2 (2017), 237--254.Google ScholarDigital Library
- Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE transactions on pattern analysis and machine intelligence (2018).Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.Google ScholarCross Ref
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European conference on computer vision. Springer, 740--755.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Zehang Lin, Zhenguo Yang, Feitao Huang, and Junhong Chen. 2018. Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2073--2077.Google Scholar
- Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. 2019. Unsupervised Embedding Learning via Invariant and Spreading Instance Feature. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6210--6219.Google ScholarCross Ref
- Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Teddy Furon, and Ondrej Chum. 2017. Efficient diffusion on region manifolds: Recovering small objects with compact cnn representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2077--2086.Google ScholarCross Ref
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. 2007. Total recall: Automatic query expansion with a generative feature model for object retrieval. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.Google ScholarCross Ref
- Fan Yang, Ryota Hinami, Yusuke Matsui, Steven Ly, and Shin'ichi Satoh. 2019. Efficient Image Retrieval via Decoupling Diffusion into Online and Offline Processing. In the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19).Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman.2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Index Terms
- Cross-domain Beauty Item Retrieval via Unsupervised Embedding Learning
Recommendations
Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval
MM '18: Proceedings of the 26th ACM international conference on MultimediaCross-domain beauty and personal care product image retrieval is a challenging problem due to data variations (e.g., brightness, viewpoint, and scale), and the rich types of items. In this paper, we present a regional maximum activations of convolutions ...
Enhancing Sparse Retrieval via Unsupervised Learning
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific RegionRecent work has shown that neural retrieval models excel at text ranking tasks in a supervised setting when given large amounts of manually labeled training data. However, it remains an open question how to train unsupervised retrieval models that are ...
Solving the Sparsity Problem in Recommendations via Cross-Domain Item Embedding Based on Co-Clustering
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningSession-based recommendations recently receive much attentions due to no available user data in many cases, e.g., users are not logged-in/tracked. Most session-based methods focus on exploring abundant historical records of anonymous users but ignoring ...
Comments