Skip to main content
Log in

Building a Lucy hybrid model for grocery sales forecasting based on time series

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Nowadays, time series data are applied in many fields, such as economics, medicine, biology, science, society, nature, environment, or typically in weather forecasting. Time series is a tool that includes methodological formulas and models to help us analyze time series data, extract potentially valuable information, capture historical fluctuations, present and support forecasts of the value of the research object in future. There are many models and methods of time series analysis that have been researched and improved these days for trend analysis and forecasts. Techniques related to time series data processing include linear regression with time series with two features unique to time series lags and time steps, the trend for model long-term changes with moving averages and time dummy, seasonality to create indicators, Fourier features to capture periodic change, and time series as features to predict the future from the pass with a lag embedding. In this article, we build a new hybrid model called Lucy Hybrid that provides full steps in the machine learning process including data pre-processing, training model, evaluation model with Mean Square Error (MSE), Root-Mean-Square Error (RMSE) and Mean Absolute Error (MAE) to compare and get the best model quality. The model also provides functions like storage and loading model to support researchers to reuse and save time on training model. In the Lucy hybrid, we also support the trend and forecast function for time series data. We experiment with a large dataset of more than 3,000,000 records from a large Ecuadorian-based grocery retailer, and we used Linear Regression, Elastic Net, Lasso, Ridge and Extra Trees Regressor, Random Forest Regressor, K-Neighbors Regressor, MLP Regressor, XGB Regressor to experiment and create 20 Lucy hybrid sample models and publish a full source code for researchers to use to expand the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5.
Fig. 6.
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Availability of data and materials

Please contact the corresponding author for data requests. The Python coding and models are available. Duy Thanh Tran, Jun-Ho Huh, Jae-Hwan Kim [44], “20 saved LucyHybrid models for res-using that we built” available at link https://github.com/thanhtd32/LucyHybrid/tree/main/models. Duy Thanh Tran, Jun-Ho Huh, Jae-Hwan Kim [45], “Full source code of LucyHybrid model” available at link https://github.com/thanhtd32/LucyHybrid. We confirm we have included a data availability statement in our main manuscript file.

References

  1. Time Series (2008) In: The concise encyclopedia of statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32833-1_401

  2. Adhikari R, Agrawal R (2013) An introductory study on time series modeling and forecasting. LAP LAMBERT Academic Publishing, Germany. https://doi.org/10.13140/2.1.2771.8084

    Book  Google Scholar 

  3. Shumway RH, Stoffer DS (2005) Time series analysis and its applications (Springer Texts in Statistics). Springer, Berlin. https://doi.org/10.1007/978-3-319-52452-8

    Article  Google Scholar 

  4. Ivanović M, Kurbalija V (2016) Time series analysis and possible applications. In: 2016 39th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 473–479. https://doi.org/10.1109/MIPRO.2016.7522190

  5. Lin J-C, Li Y-H, Liu C-H (2007) Solving the limitations of forecasting time series model by independent component analysis approach. In: Proceedings of the 18th conference on Proceedings of the 18th IASTED international conference: modelling and simulation (MOAS'07). ACTA Press, USA, pp 254–259

  6. Sauermann F, Schuh G, Gützlaff A, Theunissen T (2020) Application of time series data mining for the prediction of transition times in production. Procedia CIRP. https://doi.org/10.1016/j.procir.2020.04.054

    Article  Google Scholar 

  7. Box B, Jenkins G, Reinsel G, Ljung G (2016) Time Series Analysis: Forecasting and Control. J Am Stat Assoc 68:342–493. https://doi.org/10.2307/2284112

    Article  MATH  Google Scholar 

  8. https://www.datavedas.com/introduction-to-time-series-data. Access 21 May 2022

  9. Lee CF (2020) Time-series analysis: components, models, and forecasting, World Scientific Book Chapters. In: Lee CF, Lee JC (eds) Handbook of financial econometrics, mathematics, statistics, and machine learning chapter-26. World Scientific Publishing Co. Pte. Ltd., Singapore, pp 979–1024

    Chapter  Google Scholar 

  10. Mohr DL, Wilson WJ, Freund RJ (2022) Chapter 8—multiple regression. In: Mohr DL, Wilson WJ, Freund RJ (eds) Statistical methods (4th edn), Academic Press, pp 351–444, ISBN 9780128230435, https://doi.org/10.1016/B978-0-12-823043-5.00008-4

  11. Suhartono (2011) Time series forecasting by using seasonal autoregressive integrated moving average: subset, multiplicative or additive model. J Math Stat 7:20–27. https://doi.org/10.3844/jmssp.2011.20.27

    Article  Google Scholar 

  12. Nerlove M, Diebold FX (1990) Autoregressive and Moving-average Time-series Processes. In: Eatwell J, Milgate M, Newman P (eds) Time series and statistics the New Palgrave. Palgrave Macmillan, London. https://doi.org/10.1007/978-1-349-20865-4_3

    Chapter  Google Scholar 

  13. Siegel AF, Wagner MR (2022) Chapter 14—time series: understanding changes over time. In: Siegel AF, Wagner MR (eds) Practical business statistics (8th edn), Academic Press, Cambridge, pp 445–482, ISBN 9780128200254, https://doi.org/10.1016/B978-0-12-820025-4.00014-2

  14. Ray S, Das SS, Mishra P et al (2021) Time Series SARIMA modelling and forecasting of monthly rainfall and temperature in the South Asian Countries. Earth Syst Environ 5:531–546. https://doi.org/10.1007/s41748-021-00205-w

    Article  Google Scholar 

  15. Hansun S (2013) A new approach of moving average method in time series analysis. InL 2013 International Conference on New Media Studies, CoNMedia 2013, pp 1–4. https://doi.org/10.1109/CoNMedia.2013.6708545

  16. Anbalagan T, Uma Maheswari S (2015) Classification and prediction of stock market index based on fuzzy metagraph. Procedia Computer Science 47:214–221. https://doi.org/10.1016/j.procs.2015.03.200

    Article  Google Scholar 

  17. Ensafi Y, Amin SH, Zhang G, Shah B (2022) Time-series forecasting of seasonal items sales using machine learning—a comparative analysis. Int J Inf Manag Data Insights 2(1):100058. ISSN 2667–096, https://doi.org/10.1016/j.jjimei.2022.100058

  18. Zuo Y, Yada K, Ali ABMS (2016) Prediction of consumer purchasing in a grocery store using machine learning techniques. In: 2016 3rd Asia-Pacific world congress on computer science and engineering (APWC on CSE), 2016, pp 18–25. https://doi.org/10.1109/APWC-on-CSE.2016.015

  19. Martínez F, Charte F, Frías MP, Martínez-Rodríguez AM (2022) Strategies for time series forecasting with generalized regression neural networks. Neurocomputing 491: 509–521. ISSN 0925–2312, https://doi.org/10.1016/j.neucom.2021.12.028

  20. Pantiskas L, Verstoep K, Bal H, Interpretable multivariate time series forecasting with temporal attention convolutional neural networks. In: 2020 IEEE symposium series on computational intelligence, pp 1687–1694. https://doi.org/10.1109/SSCI47803.2020.9308570

  21. Wibawa AP, Utama ABP, Elmunsyah H et al (2022) Time-series analysis with smoothed Convolutional Neural Network. J Big Data 9:44. https://doi.org/10.1186/s40537-022-00599-y

    Article  Google Scholar 

  22. Vollmer MA, Glampson B, Mellan T et al (2021) A unified machine learning approach to time series forecasting applied to demand at emergency departments. BMC Emerg Med 21:9. https://doi.org/10.1186/s12873-020-00395-y

    Article  Google Scholar 

  23. Fu Y, Wu D, Boulet B (2022) Reinforcement learning based dynamic model combination for time series forecasting. Proc AAAI Conf Artif Intell 36(6):6639–6647. https://doi.org/10.1609/aaai.v36i6.20618

    Article  Google Scholar 

  24. Shruti K et al (2020) AI in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Front Big Data. https://doi.org/10.3389/fdata.2020.00004

    Article  Google Scholar 

  25. Wang H, Fan W, Sun F, Qian X (2015) An adaptive ensemble model of extreme learning machine for time series prediction. In: 2015 12th international computer conference on wavelet active media technology and information processing (ICCWAMTIP), pp 80–85. https://doi.org/10.1109/ICCWAMTIP.2015.7493911

  26. Guo X, Pang Y, Yan G, Qiao T (2021) Time series forecasting based on deep extreme learning machine. In: 2017 29th Chinese control and decision conference (CCDC), pp 6151–6156. https://doi.org/10.1109/CCDC.2017.7978277

  27. Singh R, Balasundaram S (2007) Application of extreme learning machine method for time series analysis. Proc World Acad Sci Eng Technol 26:361–367. https://doi.org/10.5281/zenodo.1078657

    Article  Google Scholar 

  28. Athiyarath S, Paul M, Krishnaswamy S (2020) A comparative study and analysis of time series forecasting techniques. SN Comput Sci. https://doi.org/10.1007/s42979-020-00180-5

    Article  Google Scholar 

  29. Jiang H, Ruan J, Sun J (2021) Application of machine learning model and hybrid model in retail sales forecast. In: 2021 IEEE 6th international conference on big data analytics (ICBDA), pp 69–75. https://doi.org/10.1109/ICBDA51983.2021.9403224

  30. Aburto L, Weber R (2003) Demand forecast in a supermarket using a hybrid intelligent system. Design and application of hybrid intelligent systems. IOS Press, Amsterdam, pp 1076–1083

    Google Scholar 

  31. Wang J (2020) A hybrid machine learning model for sales prediction. Int Conf Intell Comput Hum Comput Int (ICHCI) 2020:363–366. https://doi.org/10.1109/ICHCI51889.2020.00083

    Article  Google Scholar 

  32. Zhu H (2021) A deep learning based hybrid model for sales prediction of E-commerce with sentiment analysis. In: 2021 2nd international conference on computing and data science (CDS), pp 493–497. https://doi.org/10.1109/CDS52072.2021.00091

  33. Omar H, Hoang V, Liu D-R (2016) A hybrid neural network model for sales forecasting based on ARIMA and search popularity of article titles. Comput Intell Neurosci 2016:1–9. https://doi.org/10.1155/2016/9656453

    Article  Google Scholar 

  34. Chi-Jie L, Chi-Chang C (2014) A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression. Sci World J. https://doi.org/10.1155/2014/624017

    Article  Google Scholar 

  35. Aburto L, Weber R (2007) Improved supply chain management based on hybrid demand forecasts. Appl Soft Comput 7:136–144. https://doi.org/10.1016/j.asoc.2005.06.001

    Article  Google Scholar 

  36. Hu X, Yang Y, Zhu S, Chen L (2020) Research on a hybrid prediction model for purchase behavior based on logistic regression and support vector machine. In: 2020 3rd international conference on artificial intelligence and big data (ICAIBD), pp 200–204, https://doi.org/10.1109/ICAIBD49809.2020.9137484

  37. Pan F, Zhang H, Xia M (2009) A hybrid time-series forecasting model using extreme learning machines. Int Conf Intell Comput Technol Autom 1:933–936. https://doi.org/10.1109/ICICTA.2009.232

    Article  Google Scholar 

  38. Wang W, Lu Y (2018) Analysis of the mean absolute error (MAE) and the Root Mean Square Error (RMSE) in assessing rounding model. IOP Conf Ser Mater Sci Eng 324:012049. https://doi.org/10.1088/1757-899X/324/1/012049

    Article  Google Scholar 

  39. Torabi M, Rao JNK (2013) Estimation of mean squared error of model-based estimators of small area means under a nested error linear regression model. J Multivar Anal 117:76–87. ISSN 0047259X, https://doi.org/10.1016/j.jmva.2013.02.008

  40. Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250. https://doi.org/10.5194/gmd-7-1247-2014

    Article  Google Scholar 

  41. Pedregosa F et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

    MATH  Google Scholar 

  42. https://www.kaggle.com/competitions/store-sales-time-series-forecasting

  43. Tanaka K, Saito T (2019) Python deserialization denial of services attacks and their mitigations. In: Lee R (ed) Computational science/intelligence and applied informatics. CSII 2018. Studies in computational intelligence, vol 787. Springer, Cham. https://doi.org/10.1007/978-3-319-96806-3_2

    Chapter  Google Scholar 

  44. Tran DT, Huh J-H, Kim J-H. 20 saved LucyHybrid models for res-using that we built, available at link https://github.com/thanhtd32/LucyHybrid/tree/main/models.

  45. Tran DT, Huh J-H, Kim J-H. Full source code of LucyHybrid model, available at link https://github.com/thanhtd32/LucyHybrid.

Download references

Acknowledgements

No funding.

Funding

No funding.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jun-Ho Huh or Jae-Hwan Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, D.T., Huh, JH. & Kim, JH. Building a Lucy hybrid model for grocery sales forecasting based on time series. J Supercomput 79, 4048–4083 (2023). https://doi.org/10.1007/s11227-022-04824-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04824-6

Keywords

Navigation