research-article

Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank

Authors:

Kamelia Aryafar,

Josh AttenbergAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 541 - 548

https://doi.org/10.1145/2939672.2939728

Published: 13 August 2016 Publication History

Abstract

Search is at the heart of modern e-commerce. As a result, the task of ranking search results automatically (learning to rank) is a multibillion dollar machine learning problem. Traditional models optimize over a few hand-constructed features based on the item's text. In this paper, we introduce a multimodal learning to rank model that combines these traditional features with visual semantic features transferred from a deep convolutional neural network. In a large scale experiment using data from the online marketplace Etsy, we verify that moving to a multimodal representation significantly improves ranking quality. We show how image features can capture fine-grained style information not available in a text-only representation. In addition, we show concrete examples of how image information can successfully disentangle pairs of highly different items that are ranked similarly by a text-only model.

References

[1]

Aytar, Y., and Zisserman, A. Tabula rasa: Model transfer for object category detection. In Computer Vision (ICCV), 2011 IEEE International Conference on (2011), IEEE, pp. 2252--2259.

Digital Library

[2]

Bai, B., Weston, J., Grangier, D., Collobert, R., Sadamasa, K., Qi, Y., Chapelle, O., and Weinberger, K. Learning to rank with (a lot of) word features. Information retrieval 13, 3 (2010), 291--314.

Digital Library

[3]

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning (2005), ACM, pp. 89--96.

Digital Library

[4]

Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014).

[5]

Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (2013), pp. 2121--2129.

Digital Library

[6]

Gatys, L. A., Ecker, A. S., and Bethge, M. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. arXiv preprint arXiv:1505.07376 (2015).

Digital Library

[7]

Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., and Lazebnik, S. Improving image-sentence embeddings using large weakly annotated photo collections. In Computer Vision--ECCV 2014. Springer, 2014, pp. 529--545.

[8]

Guillaumin, M., Verbeek, J., and Schmid, C. Multimodal semi-supervised learning for image classification. In CVPR 2010--23rd IEEE Conference on Computer Vision & Pattern Recognition (2010), IEEE Computer Society, pp. 902--909.

[9]

Hang, L. A short introduction to learning to rank. IEICE TRANSACTIONS on Information and Systems 94, 10 (2011), 1854--1862.

[10]

Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal regression. Advances in neural information processing systems (1999), 115--132.

[11]

Jarvelin, K., and Kekalainen, J. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422--446.

Digital Library

[12]

Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (2002), ACM, pp. 133--142.

Digital Library

[13]

Kannan, A., Talukdar, P. P., Rasiwasia, N., and Ke, Q. Improving product classification using images. In Data Mining (ICDM), 2011 IEEE 11th International Conference on (2011), IEEE, pp. 310--319.

Digital Library

[14]

Karpathy, A., Joulin, A., and Li, F. F. F. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems (2014), pp. 1889--1897.

Digital Library

[15]

Kiros, R., Salakhutdinov, R., and Zemel, R. Multimodal neural language models. In Proceedings of the 31st International Conference on Machine Learning (ICML-14) (2014), pp. 595--603.

Digital Library

[16]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (2012), pp. 1097--1105.

Digital Library

[17]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (2013), pp. 3111--3119.

Digital Library

[18]

Oquab, M., Bottou, L., Laptev, I., and Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (2014), IEEE, pp. 1717--1724.

Digital Library

[19]

Pan, S. J., and Yang, Q. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on 22, 10 (2010), 1345--1359.

Digital Library

[20]

Pereira, J. C., and Vasconcelos, N. On the regularization of image semantics by modal expansion. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on (2012), IEEE, pp. 3093--3099.

Digital Library

[21]

Radlinski, F., and Joachims, T. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Proceedings of the National Conference on Artificial Intelligence (2006), vol. 21, Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, p. 1406.

Digital Library

[22]

Razavian, A., Azizpour, H., Sullivan, J., and Carlsson, S. Cnn features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2014), pp. 806--813.

Digital Library

[23]

Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[24]

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[25]

Tommasi, T., Orabona, F., and Caputo, B. Learning categories from few examples with multi model knowledge transfer. Pattern Analysis and Machine Intelligence, IEEE Transactions on 36, 5 (2014), 928--941.

[26]

Wang, G., Hoiem, D., and Forsyth, D. Building text features for object image classification. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), IEEE, pp. 1367--1374.

[27]

Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning (2009), ACM, pp. 1113--1120.

Digital Library

[28]

Weston, J., Bengio, S., and Usunier, N. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning 81, 1 (2010), 21--35.

Digital Library

[29]

Zeiler, M. D., and Fergus, R. Visualizing and understanding convolutional networks. In Computer Vision--ECCV 2014. Springer, 2014, pp. 818--833.

Cited By

Sheng XYang FGong LWang BChan ZZhang YCheng YZhu YGe TZhu HJiang YXu JZheng BSerra ESpezzano F(2024)Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and InsightsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680068(4858-4865)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680068
Jin SChoi JHyun K(2024)CMLsearch: Semantic visual search and simulation through segmented colour, material, and lighting in interior imageJournal of Computational Design and Engineering10.1093/jcde/qwae11412:1(179-299)Online publication date: 30-Dec-2024
https://doi.org/10.1093/jcde/qwae114
Meng ZLin RWu B(2024)Graph neural networks-based preference learning method for object rankingInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109131(109131)Online publication date: Jan-2024
https://doi.org/10.1016/j.ijar.2024.109131
Show More Cited By

Index Terms

Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
      2. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
  2. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

Ranking Relevance in Yahoo Search
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search ...
Learning to rank code examples for code search engines

Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user'...
Quality-biased ranking for queries with commercial intent
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Modern search engines are good enough to answer popular commercial queries with mainly highly relevant documents. However, our experiments show that users behavior on such relevant commercial sites may differ from one to another web-site with the same ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sheng XYang FGong LWang BChan ZZhang YCheng YZhu YGe TZhu HJiang YXu JZheng BSerra ESpezzano F(2024)Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and InsightsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680068(4858-4865)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680068
Jin SChoi JHyun K(2024)CMLsearch: Semantic visual search and simulation through segmented colour, material, and lighting in interior imageJournal of Computational Design and Engineering10.1093/jcde/qwae11412:1(179-299)Online publication date: 30-Dec-2024
https://doi.org/10.1093/jcde/qwae114
Meng ZLin RWu B(2024)Graph neural networks-based preference learning method for object rankingInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109131(109131)Online publication date: Jan-2024
https://doi.org/10.1016/j.ijar.2024.109131
Thammasorn PChaovalitwongse WHippe DWootton LFord ESpraker MCombs SPeeken JNyflot M(2023)Nearest Neighbor-Based Strategy to Optimize Multi-View Triplet Network for Classification of Small-Sample Medical Imaging DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.305963534:2(586-600)Online publication date: Feb-2023
https://doi.org/10.1109/TNNLS.2021.3059635
Lin JSun GShen JPritchard DYu PCui TXu DLi LBeydoun G(2022)From computer vision to short text understanding: Applying similar approaches into different disciplinesIntelligent and Converged Networks10.23919/ICN.2022.00103:2(161-172)Online publication date: Jun-2022
https://doi.org/10.23919/ICN.2022.0010
Sai SSrivastava NSharma Y(2022)Explorative Application of Fusion Techniques for Multimodal Hate Speech DetectionSN Computer Science10.1007/s42979-021-01007-73:2Online publication date: 10-Jan-2022
https://doi.org/10.1007/s42979-021-01007-7
Su YKong XLiu G(2021)Advertising Popularity Feature Collaborative Recommendation Algorithm Based on Attention-LSTM ModelSecurity and Communication Networks10.1155/2021/99402322021Online publication date: 18-Dec-2021
https://dl.acm.org/doi/10.1155/2021/9940232
Ma JPang SYang BZhu JLi Y(2020)Spatial-Content Image Search in Complex Scenes2020 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV45572.2020.9093427(2492-2500)Online publication date: Mar-2020
https://doi.org/10.1109/WACV45572.2020.9093427
Choi HZhu EBangash AMiller R(2019)VISEProceedings of the VLDB Endowment10.14778/3352063.335208012:12(1842-1845)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.14778/3352063.3352080
Yang XDeng TTan WTao XZhang JQin SDing ZZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Learning Compositional, Visual and Relational Representations for CTR Prediction in Sponsored SearchProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357833(2851-2859)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3357833
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten