Abstract
Nowadays, time series data are applied in many fields, such as economics, medicine, biology, science, society, nature, environment, or typically in weather forecasting. Time series is a tool that includes methodological formulas and models to help us analyze time series data, extract potentially valuable information, capture historical fluctuations, present and support forecasts of the value of the research object in future. There are many models and methods of time series analysis that have been researched and improved these days for trend analysis and forecasts. Techniques related to time series data processing include linear regression with time series with two features unique to time series lags and time steps, the trend for model long-term changes with moving averages and time dummy, seasonality to create indicators, Fourier features to capture periodic change, and time series as features to predict the future from the pass with a lag embedding. In this article, we build a new hybrid model called Lucy Hybrid that provides full steps in the machine learning process including data pre-processing, training model, evaluation model with Mean Square Error (MSE), Root-Mean-Square Error (RMSE) and Mean Absolute Error (MAE) to compare and get the best model quality. The model also provides functions like storage and loading model to support researchers to reuse and save time on training model. In the Lucy hybrid, we also support the trend and forecast function for time series data. We experiment with a large dataset of more than 3,000,000 records from a large Ecuadorian-based grocery retailer, and we used Linear Regression, Elastic Net, Lasso, Ridge and Extra Trees Regressor, Random Forest Regressor, K-Neighbors Regressor, MLP Regressor, XGB Regressor to experiment and create 20 Lucy hybrid sample models and publish a full source code for researchers to use to expand the model.
















Similar content being viewed by others
Availability of data and materials
Please contact the corresponding author for data requests. The Python coding and models are available. Duy Thanh Tran, Jun-Ho Huh, Jae-Hwan Kim [44], “20 saved LucyHybrid models for res-using that we built” available at link https://github.com/thanhtd32/LucyHybrid/tree/main/models. Duy Thanh Tran, Jun-Ho Huh, Jae-Hwan Kim [45], “Full source code of LucyHybrid model” available at link https://github.com/thanhtd32/LucyHybrid. We confirm we have included a data availability statement in our main manuscript file.
References
Time Series (2008) In: The concise encyclopedia of statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-32833-1_401
Adhikari R, Agrawal R (2013) An introductory study on time series modeling and forecasting. LAP LAMBERT Academic Publishing, Germany. https://doi.org/10.13140/2.1.2771.8084
Shumway RH, Stoffer DS (2005) Time series analysis and its applications (Springer Texts in Statistics). Springer, Berlin. https://doi.org/10.1007/978-3-319-52452-8
Ivanović M, Kurbalija V (2016) Time series analysis and possible applications. In: 2016 39th international convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 473–479. https://doi.org/10.1109/MIPRO.2016.7522190
Lin J-C, Li Y-H, Liu C-H (2007) Solving the limitations of forecasting time series model by independent component analysis approach. In: Proceedings of the 18th conference on Proceedings of the 18th IASTED international conference: modelling and simulation (MOAS'07). ACTA Press, USA, pp 254–259
Sauermann F, Schuh G, Gützlaff A, Theunissen T (2020) Application of time series data mining for the prediction of transition times in production. Procedia CIRP. https://doi.org/10.1016/j.procir.2020.04.054
Box B, Jenkins G, Reinsel G, Ljung G (2016) Time Series Analysis: Forecasting and Control. J Am Stat Assoc 68:342–493. https://doi.org/10.2307/2284112
https://www.datavedas.com/introduction-to-time-series-data. Access 21 May 2022
Lee CF (2020) Time-series analysis: components, models, and forecasting, World Scientific Book Chapters. In: Lee CF, Lee JC (eds) Handbook of financial econometrics, mathematics, statistics, and machine learning chapter-26. World Scientific Publishing Co. Pte. Ltd., Singapore, pp 979–1024
Mohr DL, Wilson WJ, Freund RJ (2022) Chapter 8—multiple regression. In: Mohr DL, Wilson WJ, Freund RJ (eds) Statistical methods (4th edn), Academic Press, pp 351–444, ISBN 9780128230435, https://doi.org/10.1016/B978-0-12-823043-5.00008-4
Suhartono (2011) Time series forecasting by using seasonal autoregressive integrated moving average: subset, multiplicative or additive model. J Math Stat 7:20–27. https://doi.org/10.3844/jmssp.2011.20.27
Nerlove M, Diebold FX (1990) Autoregressive and Moving-average Time-series Processes. In: Eatwell J, Milgate M, Newman P (eds) Time series and statistics the New Palgrave. Palgrave Macmillan, London. https://doi.org/10.1007/978-1-349-20865-4_3
Siegel AF, Wagner MR (2022) Chapter 14—time series: understanding changes over time. In: Siegel AF, Wagner MR (eds) Practical business statistics (8th edn), Academic Press, Cambridge, pp 445–482, ISBN 9780128200254, https://doi.org/10.1016/B978-0-12-820025-4.00014-2
Ray S, Das SS, Mishra P et al (2021) Time Series SARIMA modelling and forecasting of monthly rainfall and temperature in the South Asian Countries. Earth Syst Environ 5:531–546. https://doi.org/10.1007/s41748-021-00205-w
Hansun S (2013) A new approach of moving average method in time series analysis. InL 2013 International Conference on New Media Studies, CoNMedia 2013, pp 1–4. https://doi.org/10.1109/CoNMedia.2013.6708545
Anbalagan T, Uma Maheswari S (2015) Classification and prediction of stock market index based on fuzzy metagraph. Procedia Computer Science 47:214–221. https://doi.org/10.1016/j.procs.2015.03.200
Ensafi Y, Amin SH, Zhang G, Shah B (2022) Time-series forecasting of seasonal items sales using machine learning—a comparative analysis. Int J Inf Manag Data Insights 2(1):100058. ISSN 2667–096, https://doi.org/10.1016/j.jjimei.2022.100058
Zuo Y, Yada K, Ali ABMS (2016) Prediction of consumer purchasing in a grocery store using machine learning techniques. In: 2016 3rd Asia-Pacific world congress on computer science and engineering (APWC on CSE), 2016, pp 18–25. https://doi.org/10.1109/APWC-on-CSE.2016.015
Martínez F, Charte F, Frías MP, Martínez-Rodríguez AM (2022) Strategies for time series forecasting with generalized regression neural networks. Neurocomputing 491: 509–521. ISSN 0925–2312, https://doi.org/10.1016/j.neucom.2021.12.028
Pantiskas L, Verstoep K, Bal H, Interpretable multivariate time series forecasting with temporal attention convolutional neural networks. In: 2020 IEEE symposium series on computational intelligence, pp 1687–1694. https://doi.org/10.1109/SSCI47803.2020.9308570
Wibawa AP, Utama ABP, Elmunsyah H et al (2022) Time-series analysis with smoothed Convolutional Neural Network. J Big Data 9:44. https://doi.org/10.1186/s40537-022-00599-y
Vollmer MA, Glampson B, Mellan T et al (2021) A unified machine learning approach to time series forecasting applied to demand at emergency departments. BMC Emerg Med 21:9. https://doi.org/10.1186/s12873-020-00395-y
Fu Y, Wu D, Boulet B (2022) Reinforcement learning based dynamic model combination for time series forecasting. Proc AAAI Conf Artif Intell 36(6):6639–6647. https://doi.org/10.1609/aaai.v36i6.20618
Shruti K et al (2020) AI in healthcare: time-series forecasting using statistical, neural, and ensemble architectures. Front Big Data. https://doi.org/10.3389/fdata.2020.00004
Wang H, Fan W, Sun F, Qian X (2015) An adaptive ensemble model of extreme learning machine for time series prediction. In: 2015 12th international computer conference on wavelet active media technology and information processing (ICCWAMTIP), pp 80–85. https://doi.org/10.1109/ICCWAMTIP.2015.7493911
Guo X, Pang Y, Yan G, Qiao T (2021) Time series forecasting based on deep extreme learning machine. In: 2017 29th Chinese control and decision conference (CCDC), pp 6151–6156. https://doi.org/10.1109/CCDC.2017.7978277
Singh R, Balasundaram S (2007) Application of extreme learning machine method for time series analysis. Proc World Acad Sci Eng Technol 26:361–367. https://doi.org/10.5281/zenodo.1078657
Athiyarath S, Paul M, Krishnaswamy S (2020) A comparative study and analysis of time series forecasting techniques. SN Comput Sci. https://doi.org/10.1007/s42979-020-00180-5
Jiang H, Ruan J, Sun J (2021) Application of machine learning model and hybrid model in retail sales forecast. In: 2021 IEEE 6th international conference on big data analytics (ICBDA), pp 69–75. https://doi.org/10.1109/ICBDA51983.2021.9403224
Aburto L, Weber R (2003) Demand forecast in a supermarket using a hybrid intelligent system. Design and application of hybrid intelligent systems. IOS Press, Amsterdam, pp 1076–1083
Wang J (2020) A hybrid machine learning model for sales prediction. Int Conf Intell Comput Hum Comput Int (ICHCI) 2020:363–366. https://doi.org/10.1109/ICHCI51889.2020.00083
Zhu H (2021) A deep learning based hybrid model for sales prediction of E-commerce with sentiment analysis. In: 2021 2nd international conference on computing and data science (CDS), pp 493–497. https://doi.org/10.1109/CDS52072.2021.00091
Omar H, Hoang V, Liu D-R (2016) A hybrid neural network model for sales forecasting based on ARIMA and search popularity of article titles. Comput Intell Neurosci 2016:1–9. https://doi.org/10.1155/2016/9656453
Chi-Jie L, Chi-Chang C (2014) A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression. Sci World J. https://doi.org/10.1155/2014/624017
Aburto L, Weber R (2007) Improved supply chain management based on hybrid demand forecasts. Appl Soft Comput 7:136–144. https://doi.org/10.1016/j.asoc.2005.06.001
Hu X, Yang Y, Zhu S, Chen L (2020) Research on a hybrid prediction model for purchase behavior based on logistic regression and support vector machine. In: 2020 3rd international conference on artificial intelligence and big data (ICAIBD), pp 200–204, https://doi.org/10.1109/ICAIBD49809.2020.9137484
Pan F, Zhang H, Xia M (2009) A hybrid time-series forecasting model using extreme learning machines. Int Conf Intell Comput Technol Autom 1:933–936. https://doi.org/10.1109/ICICTA.2009.232
Wang W, Lu Y (2018) Analysis of the mean absolute error (MAE) and the Root Mean Square Error (RMSE) in assessing rounding model. IOP Conf Ser Mater Sci Eng 324:012049. https://doi.org/10.1088/1757-899X/324/1/012049
Torabi M, Rao JNK (2013) Estimation of mean squared error of model-based estimators of small area means under a nested error linear regression model. J Multivar Anal 117:76–87. ISSN 0047259X, https://doi.org/10.1016/j.jmva.2013.02.008
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Pedregosa F et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
https://www.kaggle.com/competitions/store-sales-time-series-forecasting
Tanaka K, Saito T (2019) Python deserialization denial of services attacks and their mitigations. In: Lee R (ed) Computational science/intelligence and applied informatics. CSII 2018. Studies in computational intelligence, vol 787. Springer, Cham. https://doi.org/10.1007/978-3-319-96806-3_2
Tran DT, Huh J-H, Kim J-H. 20 saved LucyHybrid models for res-using that we built, available at link https://github.com/thanhtd32/LucyHybrid/tree/main/models.
Tran DT, Huh J-H, Kim J-H. Full source code of LucyHybrid model, available at link https://github.com/thanhtd32/LucyHybrid.
Acknowledgements
No funding.
Funding
No funding.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tran, D.T., Huh, JH. & Kim, JH. Building a Lucy hybrid model for grocery sales forecasting based on time series. J Supercomput 79, 4048–4083 (2023). https://doi.org/10.1007/s11227-022-04824-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04824-6