Abstract
Compiling and managing huge e-commerce catalogs is a hard and time-consuming task for a retailer. In particular, deriving standardized and structured descriptions from unstructured data modalities, such as texts and images, is crucial to the performance of search engines and the general organization of virtual store databases. In this paper, we propose methodologies and strategies based on Deep Learning classifiers to structure, update, and inspect large e-commerce catalogs. To this purpose, we exploit multimodal representations combining data from images and unstructured textual descriptions to identify relevant labels for e-commerce applications. Such modalities of data are employed to train deep neural network architectures, which are then able to automatically recognize attributes. Three classes of architecture were investigated: variations of the VGG architecture for recognition from images; architectures combining embedding, convolutional and recurrent layers for text recognition; and hybrid architectures that combine elements from each of the previous architectures. We also propose tools that allow the detection of insufficiently descriptive visual and textual data, which can be later manually improved; and automatic enhancement of attribute annotations through neural network predictions. Using a database that we collected through a Web Crawler from a large e-commerce site, we show in our experiments that hybrid architectures achieve a better result in the classification task by combining both types of data. Finally, we show results of a case study performed to demonstrate the potential of our strategy for insufficiently descriptive data detection. We conclude that the proposed tools are effective to rectify, enhance, and efficiently update e-commerce catalogs.
Similar content being viewed by others
References
Arslan HS, Sirts K, Fishel M, Anbarjafari G (2019) Multimodal sequential fashion attribute prediction. Information 10(10):308
Bracher C, Heinz S, Vollgraf R (2016) Fashion DNA: merging content and sales data for recommendation and article mapping. CoRR arXiv:1609.02489
Cardoso Â, Daolio F, Vargas S (2018) Product characterisation towards personalisation: learning attributes from unstructured data to recommend fashion products. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, KDD 2018, London, UK, August 19-23, 2018. https://doi.org/10.1145/3219819.3219888, pp 80–89
Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, Jones L, Parmar N, Schuster M, Chen Z et al (2018) The best of both worlds: combining recent advances in neural machine translation. arXiv:1804.09849
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Dai AM, Olah C, Le QV (2015) Document embedding with paragraph vectors
Dasgupta R, Tom F, Kumar S, Das Gupta M, Kumar Y, Patro BN, Namboodiri VP (2020) Visually precise query. In: Proceedings of the 28th ACM international conference on multimedia. ACM, pp 3550–3558
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Fensel D (2001) Challenges in content management for b2b electronic commerce. In: Proceedings second international workshop on user interfaces in data intensive systems. UIDIS 2001. IEEE, pp 2–4
Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) Devise: a deep visual-semantic embedding model. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf, vol 26. Curran Associates, Inc., pp 2121–2129
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRR arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Inoue N, Simo-Serra E, Yamasaki T, Ishikawa H (2017) Multi-label fashion image classification with minimal human supervision. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2261–2267
Jurasky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing. Computational Linguistics and Speech Recognition. Prentice Hall, New Jersey
Katarya R, Arora Y (2020) Capsmf: a novel product recommender system using deep learning based text analysis model. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-020-09199-5
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https://doi.org/10.3115/v1/D14-1181. https://www.aclweb.org/anthology/D14-1181. Association for Computational Linguistics, Doha, pp 1746–1751
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Laenen K, Zoghbi S, Moens MF (2017) Cross-modal search for fashion attributes. In: Proceedings of the KDD 2017 workshop on machine learning meets fashion. ACM
Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1096–1104
Lu J, Wu D, Mao M, Wang W, Zhang G (2015) Recommender system application developments: a survey. Decis Support Syst 74:12–32
Ruder S (2016) An overview of gradient descent optimization algorithms. CoRR arXiv:1609.04747
Schindler A, Lidy T, Karner S, Hecker M (2018) Fashion and apparel classification using convolutional neural networks. CoRR arXiv:1811.04374
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sun GL, Cheng ZQ, Wu X, Peng Q (2017) Personalized clothing recommendation combining user social circle and fashion style consistency. Multimedia Tools and Applications 77:1–24. https://doi.org/10.1007/s11042-017-5245-1
Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR arXiv:1602.07261
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. CoRR arXiv:1409.4842
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75
Yu L, Simo-Serra E, Moreno-Noguer F, Rubio A (2017) Multi-modal embedding for main product detection in fashion. In: 2017 IEEE international conference on computer vision workshops (ICCVW), pp 2236–2242
Yu W, Zhang H, He X, Chen X, Xiong L, Qin Z (2018) Aesthetic-based clothing recommendation. In: Proceedings of the 2018 world wide web conference, pp 649–658
Zahavy T, Magnani A, Krishnan A, Mannor S (2016) Is a picture worth a thousand words? A deep multi-modal fusion architecture for product classification in e-commerce. CoRR arXiv:1611.09534
Acknowledgements
The authors would like to thank the Alagoas Research Foundation (FAPEAL) for the first author’s scholarship #60030001626/2018.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sales, L.F., Pereira, A., Vieira, T. et al. Multimodal deep neural networks for attribute prediction and applications to e-commerce catalogs enhancement. Multimed Tools Appl 80, 25851–25873 (2021). https://doi.org/10.1007/s11042-021-10885-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10885-1