Skip to main content

Advertisement

Autoregressive multimodal transformer for zero-shot sales forecasting of fashion products with exogenous data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Predicting future sales volumes of fashion industry products is challenging due to rapid market changes and limited historical sales data for recent products. As traditional forecasting methods and machine learning models often fail to address this problem, we propose a novel autoregressive multimodal transformer architecture to anticipate the sales volume of brand-new apparel items by capturing trends among interrelated attributes. In this paper, we utilize authentic data from a fashion company that includes a limited amount of historical time-series sales data and several influencing factors like product image, textual descriptions, and temporal attributes. To mitigate the data inadequacies, we investigate the impact of integrating exogenous knowledge from an e-tailer site filtered with fashion apparel products. Also, we found that employing the zero-shot forecasting approach further aids in forecasting with minimal time-series sales data. Our approach achieves the values of 1.546 and 16.42 in terms of MAE and WAPE, respectively, by leveraging exogenous data compared to existing benchmark models. This study demonstrates the potential of our autoregressive multimodal transformer to predict sales volumes with more precision, and it highlights the importance of incorporating the zero-shot forecasting approach in the dynamic fashion industry.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The 9oz dataset is not publicly accessible and has been provided to us by Nineounce (CNS) company purely for research purposes. The Naver dataset is derived from publicly available data on the Naver Shopping site, available at https://shopping.naver.com/. The Visuelle dataset can be found online at https://paperswithcode.com/dataset/visuelle.

Notes

  1. https://paperswithcode.com/dataset/visuelle

  2. https://shopping.naver.com/

  3. https://paperswithcode.com/sota/image-captioning-on-coco-captions

References

  1. Sohrabpour V, Oghazi P, Toorajipour R, Nazarpour A (2021) Export sales forecasting using artificial intelligence. Technol Forecast Soc Chang 163:120480

    Article  Google Scholar 

  2. Ma S, Fildes R (2021) Retail sales forecasting with meta-learning. Eur J Oper Res 288(1):111–128

    Article  MathSciNet  MATH  Google Scholar 

  3. Pan H, Zhou H (2020) Study on convolutional neural network and its application in data mining and sales forecasting for e-commerce. Electron Commer Res 20(2):297–320

    Article  MATH  Google Scholar 

  4. Wu J, Liu H, Yao X, Zhang L (2024) Unveiling consumer preferences: A two-stage deep learning approach to enhance accuracy in multi-channel retail sales forecasting. Expert Syst Appl 257:125066

    Article  Google Scholar 

  5. Lalou P, Ponis ST, Efthymiou OK (2020) Demand forecasting of retail sales using data analytics and statistical programming. Management & Marketing. 15(2):186–202

    Article  MATH  Google Scholar 

  6. Raizada S, Saini JR (2021) Comparative analysis of supervised machine learning techniques for sales forecasting. Int J Adv Comput Sci Appl 12(11):102–110

    MATH  Google Scholar 

  7. Ren S, Chan H-L, Siqin T (2020) Demand forecasting in retail operations for fashionable products: methods, practices, and real case study. Ann Oper Res 291:761–777

    Article  MathSciNet  MATH  Google Scholar 

  8. Lara-Benítez P, Carranza-García M, Riquelme JC (2021) An experimental review on deep learning architectures for time series forecasting. Int J Neural Syst 31(03):2130001

    Article  MATH  Google Scholar 

  9. Vaswani A (2017) Attention is all you need. Adv Neural Inf Process Syst

  10. Skenderi G, Joppi C, Denitto M, Cristani M (2024) Well googled is half done: Multimodal forecasting of new fashion product sales with image-based google trends. J Forecast 43(6):1982–1997

    Article  MathSciNet  Google Scholar 

  11. Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870

    Article  MathSciNet  MATH  Google Scholar 

  12. Dooley S, Khurana GS, Mohapatra C, Naidu SV, White C (2024) Forecastpfn: Synthetically-trained zero-shot forecasting. Advances in Neural Information Processing Systems 36

  13. Oreshkin BN, Carpov D, Chapados N, Bengio Y (2021) Meta-learning framework with applications to zero-shot time-series forecasting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35, pp. 9242–9250

  14. Pavlyshenko BM (2019) Machine-learning models for sales time series forecasting. Data 4(1):15

    Article  MATH  Google Scholar 

  15. Seyedan M, Mafakheri F, Wang C (2022) Cluster-based demand forecasting using bayesian model averaging: An ensemble learning approach. Decis Anal J 3:100033. https://doi.org/10.1016/j.dajour.2022.100033

    Article  Google Scholar 

  16. Giri C, Chen Y (2022) Deep learning for demand forecasting in the fashion and apparel retail industry. Forecasting 4(2):565–581

    Article  MATH  Google Scholar 

  17. Cheng W-H, Song S, Chen C-Y, Hidayati SC, Liu J (2021) Fashion meets computer vision: A survey. ACM Comput Surv (CSUR) 54(4):1–41

    Article  Google Scholar 

  18. Al-Halah Z, Grauman K (2020) From paris to berlin: Discovering fashion style influences around the world. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10136–10145

  19. Ekambaram V, Manglik K, Mukherjee S, Sajja SSK, Dwivedi S, Raykar V (2020) Attention based multi-modal new product sales time-series forecasting. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 3110–3118

  20. Omeroglu AN, Mohammed HM, Oral EA, Aydin S (2023) A novel soft attention-based multi-modal deep learning framework for multi-label skin lesion classification. Eng Appl Artif Intell 120:105897

    Article  Google Scholar 

  21. Papadopoulos S-I, Koutlis C, Papadopoulos S, Kompatsiaris I (2022) Multimodal quasi-autoregression: Forecasting the visual popularity of new fashion products. Int J Multimed Inf Retr 11(4):717–729

    Article  MATH  Google Scholar 

  22. Craparotta G, Thomassey S, Biolatti A (2019) A siamese neural network application for sales forecasting of new fashion products using heterogeneous data. Int J Comput Intell Syst 12(2):1537–1546

    Article  Google Scholar 

  23. Shin W, Park J, Woo T, Cho Y, Oh K, Song H (2022) e-clip: Large-scale vision-language representation learning in e-commerce. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pp 3484–3494

  24. Chen L, Li S, Bai Q, Yang J, Jiang S, Miao Y (2021) Review of image classification algorithms based on convolutional neural networks. Remote Sensing 13(22):4712

    Article  MATH  Google Scholar 

  25. Cao B, Araujo A, Sim J (2020) Unifying deep local and global features for image search. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp 726–743. Springer

  26. Kim W, Kanezaki A, Tanaka M (2020) Unsupervised learning of image segmentation based on differentiable feature clustering. IEEE Trans Image Process 29:8055–8068

    Article  MATH  Google Scholar 

  27. Devlin J, Chang MW, Lee K, Toutanova K (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1:4171–4186

  28. Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale. Proceedings of the Annual Meeting of the Association for Computational Linguistics 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747

  29. Gong Y, Wang L, Xu L (2023) A feature aggregation network for multispectral pedestrian detection. Appl Intell 53(19):22117–22131

    Article  MATH  Google Scholar 

  30. Li X, Jin X, Lin J, Liu S, Wu Y, Yu T, Zhou W, Chen Z (2020) Learning disentangled feature representation for hybrid-distorted image restoration. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16, pp 313–329 . Springer

  31. Wu B, Feng Y, Sun Y, Ji Y (2023) Feature aggregation via attention mechanism for visible-thermal person re-identification. IEEE Signal Process Lett 30:140–144

    Article  MATH  Google Scholar 

  32. Du Z, Wang Q (2023) Dilated transformer with feature aggregation module for action segmentation. Neural Process Lett 55(5):6181–6197

    Article  MATH  Google Scholar 

  33. Zhou X, Wei X (2023) Feature aggregation network for building extraction from high-resolution remote sensing images. In: Pacific Rim International Conference on Artificial Intelligence, pp 105–116

  34. Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433

    Article  Google Scholar 

  35. Shazeer N, Stern M (2018) Adafactor: Adaptive learning rates with sublinear memory cost. In: International Conference on Machine Learning, pp 4596–4604. PMLR

  36. Wu H, Gao Y, Guo X, Al-Halah Z, Rennie S, Grauman K, Feris R (2021) Fashion iq: A new dataset towards retrieving images by natural language feedback. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11307–11317

Download references

Acknowledgements

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2023-RS-2023-00254177) grant funded by the Korea government (MSIT). Also, We are grateful to the Nineounce (CNS) company for providing the data for this research work.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: [Muralidharan Rajendran and Bonghee Hong]; Methodology: [Muralidharan Rajendran]; Software: [Muralidharan Rajendran]; Validation: [Muralidharan Rajendran]; Formal analysis: [Muralidharan Rajendran]; Investigation: [Mu-ralidharan Rajendran and Bonghee Hong]; Resources: [Bonghee Hong]; Data curation: [Muralidharan Rajendran]; Writing - original draft preparation: [Muralidharan Rajendran]; Writing - review and editing: [Bonghee Hong]; Visualization: [Muralidharan Rajendran]; Supervision: [Bonghee Hong]; Project administration: [Bonghee Hong]; Funding acquisition: [Bonghee Hong]

Corresponding author

Correspondence to Bonghee Hong.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent for publication

Both authors have read and agreed to the published version of the manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajendran, M., Hong, B. Autoregressive multimodal transformer for zero-shot sales forecasting of fashion products with exogenous data. Appl Intell 55, 108 (2025). https://doi.org/10.1007/s10489-024-05972-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05972-3

Keywords