An E-Commerce Dataset in French for Multi-modal Product Categorization and Cross-Modal Retrieval

Amoualian, Hesam; Goswami, Parantapa; Das, Pradipto; Montalvo, Pablo; Ach, Laurent; Dean, Nathaniel R.

doi:10.1007/978-3-030-72113-8_2

Hesam Amoualian¹⁶,
Parantapa Goswami¹⁴,
Pradipto Das¹⁵,
Pablo Montalvo¹⁴,
Laurent Ach¹⁴ &
…
Nathaniel R. Dean¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12656))

Included in the following conference series:

European Conference on Information Retrieval

2330 Accesses

Abstract

A multi-modal dataset of ninety nine thousand product listings are made available from the production catalog of Rakuten France, a major e-commerce platform. Each product in the catalog data contains a textual title, a (possibly empty) textual description and an associated image. The dataset has been released as part of a data challenge hosted by the SIGIR ECom’20 Workshop. Two tasks are proposed, namely a principal large-scale multi-modal classification task and a subsidiary cross-modal retrieval task. This real world dataset contains around 85K products and their corresponding product type categories that are released as training data and around 9.5K and 4.5K products are released as held-out test sets for the multi-modal classification and cross-modal retrieval tasks respectively. The evaluation is run in two phases to measure system performance, first on 10% of the test data, and then on the rest 90% of the test data. The different systems are evaluated using macro-F1 score for the multi-modal classification task and recall@1 for the cross-modal retrieval task. Additionally, a robust baseline system for the multi-modal classification task is proposed. The top performance obtained at the end of the second phase is \(91.44\%\) macro-F1 and \(34.28\%\) recall@1 for the two tasks respectively.

H. Amoualian—Most of the work was performed while at RIT-Paris.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Rakuten France Multimodal Dataset in https://rit.rakuten.co.jp/data_release/.
2.
Gross Merchandise Volume (GMV) is the total monetary value for merchandise sold through a particular marketplace over a certain period of time.
3.
https://huggingface.co/transformers/summary.html.
4.
https://huggingface.co/distilbert-base-multilingual-cased.

References

Fashion-MNIST. https://github.com/zalandoresearch/fashion-mnist
Innerwear data from victoria’s secret and others. https://www.kaggle.com/PromptCloudHQ/innerwear-data-from-victorias-secret-and-others
Cardoso, Â., Daolio, F., Vargas, S.: Product characterisation towards personalisation: learning attributes from unstructured data to recommend fashion products. In: Proceedings of the 24th ACM International Conference on Knowledge Discovery & Data Mining (SIGKDD), pp. 80–89 (2018)
Google Scholar
Corbiere, C., Ben-Younes, H., Rame, A., Ollion, C.: Leveraging weakly annotated data for fashion image retrieval and label prediction. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) (October 2017). https://doi.org/10.1109/iccvw.2017.266
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Google Scholar
Dong, X., et al. AutoKnow: self-driving knowledge collection for products of thousands of types. arXiv arXiv:2006.13473 (2020)
Duong, C.T., Lebret, R., Aberer, K.: Multimodal classification for analysing social media, CoRR abs/1708.02099 (2017)
Google Scholar
Dąbrowski, J., et al.: An efficient manifold density estimator for all recommendation systems (2020)
Google Scholar
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improved visual-semantic embeddings, CoRR abs/1707.05612 (2017)
Google Scholar
Han, X., et al.: Automatic spatially-aware fashion concept discovery (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015). http://arxiv.org/abs/1512.03385
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. arXiv preprint arXiv:1702.08734 (2017)
Kiela, D., Bhooshan, S., Firooz, H., Testuggine, D.: Supervised multimodal bitransformers for classifying images and text (2019)
Google Scholar
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models, CoRR abs/1411.2539 (2014)
Google Scholar
Kolesnikov, A., et al.: Big transfer (BiT): general visual representation learning (2019)
Google Scholar
Le, H., et al.: FlauBERT: unsupervised language model pre-training for French. In: Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, 11–16 May 2020, pp. 2479–2490. European Language Resources Association (2020)
Google Scholar
Lin, Y.C., Das, P., Trotman, A., Kallumadi, S.: A dataset and baselines for e-commerce product categorization. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2019, pp. 213–216. Association for Computing Machinery, New York (2019)
Google Scholar
Martin, L., et al.: CamemBERT: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 7203–7219. Association for Computational Linguistics (July 2020). https://www.aclweb.org/anthology/2020.acl-main.645
McAuley, J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes (2015)
Google Scholar
Park, G., Han, C., Yoon, W., Kim, D.: MHSAN: multi-head self-attention network for visual semantic embedding, CoRR abs/2001.03712 (2020)
Google Scholar
Qi, D., Su, L., Song, J., Cui, E., Bharti, T., Sacheti, A.: ImageBERT: cross-modal pre-training with large-scale weak-supervised image-text data, CoRR abs/2001.07966 (2020)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter (2019)
Google Scholar
Sidorov, M.: Attribute extraction from ecommerce product descriptions. CS229 (2018)
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv arXiv:1910.03771 (2019)
Yang, F., et al.: Visual search at eBay. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (August 2017). https://doi.org/10.1145/3097983.3098162

Download references

Author information

Authors and Affiliations

Rakuten Institute of Technology (RIT), Paris, France
Parantapa Goswami, Pablo Montalvo & Laurent Ach
Rakuten Institute of Technology (RIT), Boston, USA
Pradipto Das & Nathaniel R. Dean
Tessella Altran, Paris, France
Hesam Amoualian

Authors

Hesam Amoualian
View author publications
You can also search for this author in PubMed Google Scholar
Parantapa Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Pradipto Das
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Montalvo
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Ach
View author publications
You can also search for this author in PubMed Google Scholar
Nathaniel R. Dean
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parantapa Goswami .

Editor information

Editors and Affiliations

Radboud University Nijmegen, Nijmegen, The Netherlands
Djoerd Hiemstra
Department of Computer Science, Katholieke Universiteit Leuven, Heverlee, Belgium
Marie-Francine Moens
Toulouse Institute of Computer Science Research, Toulouse, France
Josiane Mothe
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Raffaele Perego
Leipzig University, Leipzig, Germany
Martin Potthast
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy
Fabrizio Sebastiani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Amoualian, H., Goswami, P., Das, P., Montalvo, P., Ach, L., Dean, N.R. (2021). An E-Commerce Dataset in French for Multi-modal Product Categorization and Cross-Modal Retrieval. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-72113-8_2
Published: 27 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72112-1
Online ISBN: 978-3-030-72113-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics