Abstract
Forecasting prices of used construction equipment is challenging due to spatial and temporal price fluctuations. Automating this forecasting process using current market data is, therefore, highly desirable. A promising and common strategy is the application of machine learning (ML) techniques. However, small and medium-sized enterprise often struggle with the implementation of ML approaches due to a lack of ML expertise. In response, we demonstrate the potential of substituting manually created ML pipelines with automated machine learning (AutoML) solutions, which autonomously create the underlying pipelines. Therefore, we follow the CRISP-DM process to identify tasks requiring ML expertise. First, we dissect the ML pipeline into an machine learning and non-machine learning part and use AutoML to automate the former. Consecutively, we also automate the data preprocessing step, being part of the non-machine learning tasks, to further reduce the dependency on data processing expertise. Additionally, we implement a data-centric result evaluation, rating the reliability of the trained ML models. This approach supports the domain-driven creation of ML pipelines, democratizing the use of ML. To address all complex industrial requirements and showcase the practicality of our approach, we developed an innovative metric called method evaluation score. This metric encompasses key technical and non-technical parameters essential for domain experts to assess the quality and usability of the generated models. Based on this metric, we demonstrate in our case study that combining domain knowledge with AutoML and automatic preprocessing can reduce the reliance on ML experts for innovative small and medium-sized enterprise keen on adopting such technologies.
Similar content being viewed by others
Data availability
The data is available within the GitHub repository depicted in https://github.com/AutoQML/End-to-End-Automated-Price-Forecasting.
Notes
The list of hyperparameters is available in Appendix 7.
References
Ali R, Lee S, Chung TC. Accurate multi-criteria decision making methodology for recommending machine learning algorithm. Expert Syst Appl. 2017;71:257–78.
Alshboul O, Shehadeh A, Al-Kasasbeh M, Al Mamlook RE, Halalsheh N, Alkasasbeh M. Deep and machine learning approaches for forecasting the residual value of heavy construction equipment: a management decision support model. Engineering, Construction and Architectural Management; 2021.
Ardic OP, Mylenko N, Saltane V. Small and medium enterprises: A cross-country analysis with a new data; 2011.
Baudart G, Hirzel M, Kate K, Ram P, Shinnar A, Tsay J. Pipeline combinators for gradual automl. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang P, Vaughan JW, editors. Advances in Neural Information Processing Systems, vol. 34. Curran Associates Inc; 2021. p. 19705–18.
Bauer M, van Dinther C, Kiefer D. Machine learning in sme: an empirical study on enablers and success factors. AIS Electronic Library (AISeL); 2020.
Bergstra J, Bengio Y. Random search for hyper-parameter optimization. JMLR. 2012;13:281–305.
Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, Bernstein MS, Bohg J, Bosselut A, Brunskill E, Brynjolfsson E, Buch S, Card D, Castellon R, Chatterji NS, Chen AS, Creel KA, Davis J, Demszky D, Donahue C, Doumbouya M, Durmus E, Ermon S, Etchemendy J, Ethayarajh K, Fei-Fei L, Finn C, Gale T, Gillespie LE, Goel K, Goodman ND, Grossman S, Guha N, Hashimoto T, Henderson P, Hewitt J, Ho DE, Hong J, Hsu K, Huang J, Icard TF, Jain S, Jurafsky D, Kalluri P, Karamcheti S, Keeling G, Khani F, Khattab O, Koh PW, Krass MS, Krishna R, Kuditipudi R, Kumar A, Ladhak F, Lee M, Lee T, Leskovec J, Levent I, Li XL, Li X, Ma T, Malik A, Manning CD, Mirchandani SP, Mitchell E, Munyikwa Z, Nair S, Narayan A, Narayanan D, Newman B, Nie A, Niebles JC, Nilforoshan H, Nyarko JF, Ogut G, Orr L, Papadimitriou I, Park JS, Piech C, Portelance E, Potts C, Raghunathan A, Reich R, Ren H, Rong F, Roohani YH, Ruiz C, Ryan J, R’e C, Sadigh D, Sagawa S, Santhanam K, Shih A, Srinivasan KP, Tamkin A, Taori R, Thomas AW, Tramèr F, Wang RE, Wang W, Wu B, Wu J, Wu Y, Xie SM, Yasunaga M, You J, Zaharia MA, Zhang M, Zhang T, Zhang X, Zhang Y, Zheng L, Zhou K, Liang P. On the opportunities and risks of foundation models. 2021.
Carlini N, Erlingsson Úlfar, Papernot N. Distribution density, tails, and outliers in machine learning: Metrics and applications; 2019. arXiv preprint arXiv:1910.13427
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; 2016. p. 785–794.
Chiteri M. Cash-Flow and Residual Value Analysis for Construction Equipment. Master’s thesis, University of Alberta; 2018.
Crisan A. Fiore-Gartland B. Fits and Starts: Enterprise Use of AutoML and the Role of Humans in the Loop. In: Conference on Human Factors in Computing Systems (Association for Computing Machinery, 2021). p. 1–15.
De Mauro A, Greco M, Grimaldi M, Ritala P. Human resources for big data professions: A systematic classification of job roles and required skill sets. Information Processing & Management. 2018;54(5).
Erickson N, Mueller J, Shirkov A, Zhang H, Larroy P, Li M, Smola A. Autogluon-tabular: Robust and accurate automl for structured data; 2020. arXiv preprint arXiv:2003.06505
Fan H, AbouRizk S, Kim H, Zaïane O. Assessing residual value of heavy construction equipment using predictive data mining model. J Comput Civ Eng. 2008;22(3):181–91.
Feurer M, Eggensperger K, Falkner S, Lindauer M, Hutter F. Auto-sklearn 2.0: Hands-free automl via meta-learning; 2020. arXiv preprint arXiv:2007.04074
Frazier PI. A tutorial on bayesian optimization; 2018. p. 1–22. arXiv preprint arXiv: 1807.02811
Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media; 2022.
Gijsbers P, LeDell E, Thomas J, Poirier S, Bischl B, Vanschoren J. An open source automl benchmark; 2019. arXiv preprint arXiv:1907.00909
Hollmann N, Müller S, Hutter F. Llms for semi-automated data science: Introducing caafe for context-aware automated feature engineering; 2023.
Hong S, Zhuge M, Chen J, Zheng X, Cheng Y, Zhang C, Wang J, Wang Z, Yau SKS, Lin Z, Zhou L, Ran C, Xiao L, Wu C, Schmidhuber J. Metagpt: Meta programming for a multi-agent collaborative framework. Science. 2023.
Hutter F, Kotthoff L, Vanschoren J. Automated machine learning: methods, systems, challenges. Springer Nature; 2019.
Jenkins DG, Quintana-Ascencio PF. A solution to minimum sample size for regressions. PloS one. 2020;15(2).
Jin H, Chollet F, Song Q, Hu X. Autokeras: An automl library for deep learning. J Mach Learn Res. 2023;24(6):1–6.
Kanter JM, Veeramachaneni K. Deep feature synthesis: Towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19–21, 2015 (IEEE, 2015). p. 1–10.
Kolyshkina I, Simoff S. Interpretability of machine learning solutions in industrial decision engineering. In: Australasian Conference on Data Mining; 2019.
Lucko G. A statistical analysis and model of the residual value of different types of heavy construction equipment. Ph.D. thesis, Virginia Tech; 2003.
Lucko G. Modeling the residual market value of construction equipment under changed economic conditions. JCEMD4. 2011;137(10):806–16.
Lucko G, Vorster MC. Predicting the residual value of heavy construction equipment. In: Towards a vision for information technology in civil engineering. American Society of Civil Engineers; 2004.
Lucko G, Vorster MC, Anderson-Cook CM. Unknown element of owning costs - impact of residual value. JCEMD4. 2007;133(1).
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, editors. Advances in Neural Information Processing Systems, vol. 30. Curran Associates Inc; 2017. p. 4765–74.
Microsoft: Neural Network Intelligence; 2021. https://github.com/microsoft/nni
Milošević I, Kovačević M, Petronijević P. Estimating residual value of heavy construction equipment using ensemble learning. JCEMD4. 2021;147(7).
Milošević I, Petronijević P, Arizanović D. Determination of residual value of construction machinery based on machine age. Građevinar. 2020;72:45–55.
Newman DA. Missing data: Five practical guidelines. Organizational Research Methods. 2014;17(4).
Nielsen J. Usability Heuristics, chap. 5.5 Feedback. Morgan Kaufmann; 1993.
Peng D, Dong X, Real E, Tan M, Lu Y, Bender G, Liu H, Kraft A, Liang C, Le Q. Pyglove: Symbolic programming for automated machine learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, editors. Advances in Neural Information Processing Systems, vol. 33. Curran Associates Inc; 2020. p. 96–108.
Ponnaluru SS, Marsh TL, Brady M. Spatial price analysis of used construction equipment: The case of excavators. Constr Manag Econ. 2012;30(11):981–94.
Shapley LS. Notes on the N-Person Game - II: The Value of an N-Person Game. Santa Monica, CA: RAND Corporation; 1951.
Shearer C. The crisp-dm model: the new blueprint for data mining. Journal of data warehousing. 2000;5(4).
Shehadeh A, Alshboul O, Al Mamlook RE, Hamedat O. Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, lightgbm, and xgboost regression. Automation in Construction. 2021;129.
Stühler H, Zöller M, Klau D, Beiderwellen-Bedrikow A, Tutschku C. Benchmarking automated machine learning methods for price forecasting applications. In: Proceedings of the 12th International Conference on Data Science, Technology and Applications - DATA (INSTICC, SciTePress, 2023). p. 30–39.
Studer S, Bui TB, Drescher C, Hanuschkin A, Winkler L, Peters S, Müller KR. Towards crisp-ml (q): a machine learning process model with quality assurance methodology. Machine Learning and Knowledge Extraction. 2021;3(2):392–413.
Vinutha H, Poornima B, Sagar B. Detection of outliers using interquartile range technique from intrusion dataset. In: Information and Decision Sciences: Proceedings of the 6th International Conference on FICTA (Springer, 2018). pp. 511–518.
Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol. 2014;14:1–13.
Wang C, Wu Q, Liu X, Quintanilla L. Automated machine learning & tuning with flaml. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022.
Yao Q, Wang M, Chen Y, Dai W, Li YF, Tu WW, Yang Q, Yu Y. Taking human out of learning applications: A survey on automated machine learning; 2018. arXiv preprint arXiv:1810.13306
Zhang S, Gong C, Wu L, Liu X, Zhou M. Automl-gpt: Automatic machine learning with gpt; 2023.
Zöller MA, Huber MF. Benchmark and survey of automated machine learning frameworks. Journal of Artificial Intelligence Research. 2021;70:409–72.
Zöller MA, Nguyen TD, Huber MF. Incremental search space construction for machine learning pipeline synthesis. In: Advances in Intelligent Data Analysis XIX; 2021.
Zong Y. Maintenance cost and residual value prediction of heavy construction equipment. Master’s thesis, University of Alberta; 2017.
Zoph B, Le QV. Neural architecture search with reinforcement learning; 2016. arXiv preprint arXiv:1611.01578
Acknowledgements
This work was partly funded by the German Federal Ministry of Economic Affairs and Climate Action in the research project AutoQML (Grant no. 01MQ22002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Recent Trends on Data Science, Technology and Applications” guest edited by Slimane Hammoudi, Alfredo Cuzzocrea and Oleg Gusikhin.
Appendices
Appendix A Example Usage
The manual implementation of the ML methods (Polynomial Regression, Decision Tree, Random Forest, Support Vector Regressor, K-Nearest Neighbor, AdaBoost Regressor and Multy Layer Perceptron) require approximately 50 lines of code (LOC) on average and approximately 13 different libraries:
On the other hand training and prediction with AutoGluon can be implemented within three lines of code:
The same holds for AutoSklearn
Flaml
and AutoKeras
Appendix B Search Spaces
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Stühler, H., Klau, D., Zöller, MA. et al. End-to-End Implementation of Automated Price Forecasting Applications. SN COMPUT. SCI. 5, 402 (2024). https://doi.org/10.1007/s42979-024-02735-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02735-2