Abstract
Predicting future sales volumes of fashion industry products is challenging due to rapid market changes and limited historical sales data for recent products. As traditional forecasting methods and machine learning models often fail to address this problem, we propose a novel autoregressive multimodal transformer architecture to anticipate the sales volume of brand-new apparel items by capturing trends among interrelated attributes. In this paper, we utilize authentic data from a fashion company that includes a limited amount of historical time-series sales data and several influencing factors like product image, textual descriptions, and temporal attributes. To mitigate the data inadequacies, we investigate the impact of integrating exogenous knowledge from an e-tailer site filtered with fashion apparel products. Also, we found that employing the zero-shot forecasting approach further aids in forecasting with minimal time-series sales data. Our approach achieves the values of 1.546 and 16.42 in terms of MAE and WAPE, respectively, by leveraging exogenous data compared to existing benchmark models. This study demonstrates the potential of our autoregressive multimodal transformer to predict sales volumes with more precision, and it highlights the importance of incorporating the zero-shot forecasting approach in the dynamic fashion industry.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The 9oz dataset is not publicly accessible and has been provided to us by Nineounce (CNS) company purely for research purposes. The Naver dataset is derived from publicly available data on the Naver Shopping site, available at https://shopping.naver.com/. The Visuelle dataset can be found online at https://paperswithcode.com/dataset/visuelle.
Notes
https://paperswithcode.com/dataset/visuelle
https://shopping.naver.com/
https://paperswithcode.com/sota/image-captioning-on-coco-captions
References
Sohrabpour V, Oghazi P, Toorajipour R, Nazarpour A (2021) Export sales forecasting using artificial intelligence. Technol Forecast Soc Chang 163:120480
Ma S, Fildes R (2021) Retail sales forecasting with meta-learning. Eur J Oper Res 288(1):111–128
Pan H, Zhou H (2020) Study on convolutional neural network and its application in data mining and sales forecasting for e-commerce. Electron Commer Res 20(2):297–320
Wu J, Liu H, Yao X, Zhang L (2024) Unveiling consumer preferences: A two-stage deep learning approach to enhance accuracy in multi-channel retail sales forecasting. Expert Syst Appl 257:125066
Lalou P, Ponis ST, Efthymiou OK (2020) Demand forecasting of retail sales using data analytics and statistical programming. Management & Marketing. 15(2):186–202
Raizada S, Saini JR (2021) Comparative analysis of supervised machine learning techniques for sales forecasting. Int J Adv Comput Sci Appl 12(11):102–110
Ren S, Chan H-L, Siqin T (2020) Demand forecasting in retail operations for fashionable products: methods, practices, and real case study. Ann Oper Res 291:761–777
Lara-Benítez P, Carranza-García M, Riquelme JC (2021) An experimental review on deep learning architectures for time series forecasting. Int J Neural Syst 31(03):2130001
Vaswani A (2017) Attention is all you need. Adv Neural Inf Process Syst
Skenderi G, Joppi C, Denitto M, Cristani M (2024) Well googled is half done: Multimodal forecasting of new fashion product sales with image-based google trends. J Forecast 43(6):1982–1997
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Dooley S, Khurana GS, Mohapatra C, Naidu SV, White C (2024) Forecastpfn: Synthetically-trained zero-shot forecasting. Advances in Neural Information Processing Systems 36
Oreshkin BN, Carpov D, Chapados N, Bengio Y (2021) Meta-learning framework with applications to zero-shot time-series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp. 9242–9250
Pavlyshenko BM (2019) Machine-learning models for sales time series forecasting. Data 4(1):15
Seyedan M, Mafakheri F, Wang C (2022) Cluster-based demand forecasting using bayesian model averaging: An ensemble learning approach. Decis Anal J 3:100033. https://doi.org/10.1016/j.dajour.2022.100033
Giri C, Chen Y (2022) Deep learning for demand forecasting in the fashion and apparel retail industry. Forecasting 4(2):565–581
Cheng W-H, Song S, Chen C-Y, Hidayati SC, Liu J (2021) Fashion meets computer vision: A survey. ACM Comput Surv (CSUR) 54(4):1–41
Al-Halah Z, Grauman K (2020) From paris to berlin: Discovering fashion style influences around the world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10136–10145
Ekambaram V, Manglik K, Mukherjee S, Sajja SSK, Dwivedi S, Raykar V (2020) Attention based multi-modal new product sales time-series forecasting. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 3110–3118
Omeroglu AN, Mohammed HM, Oral EA, Aydin S (2023) A novel soft attention-based multi-modal deep learning framework for multi-label skin lesion classification. Eng Appl Artif Intell 120:105897
Papadopoulos S-I, Koutlis C, Papadopoulos S, Kompatsiaris I (2022) Multimodal quasi-autoregression: Forecasting the visual popularity of new fashion products. Int J Multimed Inf Retr 11(4):717–729
Craparotta G, Thomassey S, Biolatti A (2019) A siamese neural network application for sales forecasting of new fashion products using heterogeneous data. Int J Comput Intell Syst 12(2):1537–1546
Shin W, Park J, Woo T, Cho Y, Oh K, Song H (2022) e-clip: Large-scale vision-language representation learning in e-commerce. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp 3484–3494
Chen L, Li S, Bai Q, Yang J, Jiang S, Miao Y (2021) Review of image classification algorithms based on convolutional neural networks. Remote Sensing 13(22):4712
Cao B, Araujo A, Sim J (2020) Unifying deep local and global features for image search. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp 726–743. Springer
Kim W, Kanezaki A, Tanaka M (2020) Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans Image Process 29:8055–8068
Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1:4171–4186
Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. Proceedings of the Annual Meeting of the Association for Computational Linguistics 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
Gong Y, Wang L, Xu L (2023) A feature aggregation network for multispectral pedestrian detection. Appl Intell 53(19):22117–22131
Li X, Jin X, Lin J, Liu S, Wu Y, Yu T, Zhou W, Chen Z (2020) Learning disentangled feature representation for hybrid-distorted image restoration. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp 313–329 . Springer
Wu B, Feng Y, Sun Y, Ji Y (2023) Feature aggregation via attention mechanism for visible-thermal person re-identification. IEEE Signal Process Lett 30:140–144
Du Z, Wang Q (2023) Dilated transformer with feature aggregation module for action segmentation. Neural Process Lett 55(5):6181–6197
Zhou X, Wei X (2023) Feature aggregation network for building extraction from high-resolution remote sensing images. In: Pacific Rim International Conference on Artificial Intelligence, pp 105–116
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433
Shazeer N, Stern M (2018) Adafactor: Adaptive learning rates with sublinear memory cost. In: International Conference on Machine Learning, pp 4596–4604. PMLR
Wu H, Gao Y, Guo X, Al-Halah Z, Rennie S, Grauman K, Feris R (2021) Fashion iq: A new dataset towards retrieving images by natural language feedback. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11307–11317
Acknowledgements
This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00254177) grant funded by the Korea government (MSIT). Also, We are grateful to the Nineounce (CNS) company for providing the data for this research work.
Author information
Authors and Affiliations
Contributions
Conceptualization: [Muralidharan Rajendran and Bonghee Hong]; Methodology: [Muralidharan Rajendran]; Software: [Muralidharan Rajendran]; Validation: [Muralidharan Rajendran]; Formal analysis: [Muralidharan Rajendran]; Investigation: [Mu-ralidharan Rajendran and Bonghee Hong]; Resources: [Bonghee Hong]; Data curation: [Muralidharan Rajendran]; Writing - original draft preparation: [Muralidharan Rajendran]; Writing - review and editing: [Bonghee Hong]; Visualization: [Muralidharan Rajendran]; Supervision: [Bonghee Hong]; Project administration: [Bonghee Hong]; Funding acquisition: [Bonghee Hong]
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent for publication
Both authors have read and agreed to the published version of the manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rajendran, M., Hong, B. Autoregressive multimodal transformer for zero-shot sales forecasting of fashion products with exogenous data. Appl Intell 55, 108 (2025). https://doi.org/10.1007/s10489-024-05972-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05972-3