Abstract
Since there has been concern about food security, accurate prediction of wheat yield prior to harvest is a key component. Random Forest (RF) has been used in many classification and regression applications, such as yield estimation, and the performance of RF has improved by tuning its hyperparameters. In this paper, different changes are made to traditional RF for yield estimation, and the performance of RF is evaluated. Accordingly, RFs constructed using various weak learners, as well as a combined RF consisting of different weak learners are assessed by growing weak Gaussian Process Regression (GPR), Decision Tree (DT), Neural Network (NN), and Stepwise Regression (SW) models in the forest. The input data to DTs are also partitioned into leaves in N (e.g., two) dimensional feature space by using clustering in each parent node. In addition, a subset of the training set is randomly sampled with replacement for training a learner in the forest, instead of randomly sampling the whole training set in traditional RF. Using clustering in DTs added flexibility while utilizing NN as a weak learner yielded the most favorable outcomes in our research. The number of input training samples (Itree) to each tree was also identified as a new hyperparameter to the forest, and the prediction results were more influenced by the Itree compared to the known hyperparameters, such as the number of trees in the forest (ntree) and the number of features for each tree (mtry).
Similar content being viewed by others
Data availability
The datasets analyzed during the current study are available on reasonable request.
References
Ao Y, Li H et al (2019) Identifying channel sand-body from multiple seismic attributes with an improved random forest algorithm. J Petrol Sci Eng 173:781–792
Ashourloo, D, Manafifard M, et al. (2022) Wheat yield prediction based on Sentinel-2, regression and machine learning models in Hamedan, Iran. Scientia Iranica. https://doi.org/10.24200/sci.2022.57809.5429
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32
Çakır, Y, Kırcı M, et al. (2014) Yield prediction of wheat in south-east region of Turkey by using artificial neural networks. 2014 The Third International Conference on Agro-Geoinformatics, Beijing, China
Chaudhary A, Kolhe S et al (2016) An improved random forest classifier for multi-class classification. Inf Process Agric 3(4):215–222
Chu L, Huang C et al (2020) Spatial heterogeneity of winter wheat yield and its determinants in the Yellow River Delta China. Sustain 12(1):135. https://doi.org/10.3390/su12010135
Demidova LA, Klyueva IA et al (2019) Hybrid approach to improving the results of the SVM classification using the random forest algorithm. Procedia Comput Sci 150:455–461
Dong X, Li G et al (2021) Multiscale feature extraction from the perspective of graph for hob fault diagnosis using spectral graph wavelet transform combined with improved random forest. Measurement 176:109178
Du M, Noguchi N (2017) Monitoring of wheat growth status and mapping of wheat yield’s within-field spatial variations using color images acquired from uav-camera system. Remote Sens 9(3):289
Feng Y, Lin W et al (2021) Effects of fallow tillage on winter wheat yield and predictions under different precipitation types. PeerJ 9:e12602–e12602
Feng T, Wang C et al (2022) An improved artificial bee colony-random forest (IABC-RF) model for predicting the tunnel deformation due to an adjacent foundation pit excavation. Underground Space 7(4):514–527
Fu Z, Jiang J et al (2020) Wheat growth monitoring and yield estimation based on multi-rotor unmanned aerial vehicle. Remote Sens 12(3):508
Gao X, Wen J et al (2019) An improved random forest algorithm for predicting employee turnover. Math Probl Eng 2019:4140707
Halwani M, Bachinger J (2021) Using four data mining techniques to predict grain yield response of winter wheat under organic farming system. Lecture Notes in Informatics (LNI). Gesellschaft Für Informatik, Bonn 2021:121–126
Han Q, Gui C et al (2019) A generalized method to predict the compressive strength of high-performance concrete by improved random forest algorithm. Constr Build Mater 226:734–742
Han J, Zhang Z et al (2020) Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China. Remote Sens 12(2):236
Han S, Williamson BD et al (2021) Improving random forest predictions in small datasets from two-phase sampling designs. BMC Med Inform Decis Mak 21(1):322
Ishwaran H, Kogalur UB et al (2011) Random survival forests for high-dimensional data. Statistical Anal Data Min 4:115–132
Jalal N, Mehmood A et al (2022) A novel improved random forest for text classification using feature ranking and optimal number of trees. J King Saud Univ – Comput Inf Sci 34(6):2733–2742. https://doi.org/10.1016/j.jksuci.2022.03.012
Kalaiselvi B, Thangamani M (2020) An efficient Pearson correlation based improved random forest classification for protein structure prediction techniques. Measurement 162:107885
Kulkarni VY, Sinha DPK (2013) Random forest classifiers : a survey and future research directions. Int J Adv Comput 36(1):1144–1153
Lei M, Yu X et al (2018) Geographic origin identification of coal using near-infrared spectroscopy combined with improved random forest method. Infrared Phys Technol 92:177–182
Li J, Veeranampalayam-Sivakumar A-N et al (2019) Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery. Plant Methods 15(1):123
Li X, Liu J et al (2021) Measurement and analysis of regional agricultural water and soil resource composite system harmony with an improved random forest model based on a dragonfly algorithm. J Clean Prod 305:127217
Murakami K, Shimoda S et al (2021) Prediction of municipality-level winter wheat yield based on meteorological data using machine learning in Hokkaido. Japan Plos One 16(10):1–19
Pang A, Chang MWL et al (2022) Evaluation of random forests (RF) for regional and local-scale wheat yield prediction in southeast Australia. Sensors 22(3):717
Paul A, Mukherjee DP et al (2018) Improved random forest for classification. IEEE Trans Image Process 27(8):4012–4024
Rahman MM, Crain J et al (2021) Improving wheat yield prediction using secondary traits and high-density phenotyping under heat-stressed environments. Front Plant Sci 12:633–651
Ren J, Chen Z et al (2008) Regional yield estimation for winter wheat with MODIS-NDVI data in Shandong, China. Int J Appl Earth Obs Geoinf 10(4):403–413
Robnik-Šikonja, M (2004) Improving Random Forests. Machine Learning: ECML 2004, Berlin, Heidelberg, Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_34
Roell YE, Beucher A et al (2020) Comparing a random forest based prediction of winter wheat yield to historical tield potential. Agronomy 10(3):1–17
Shahhosseini, M, Hu G (2020) Improved weighted random forest for classification problems. ArXiv: 1–16
Sharma SK, Lilhore UK et al (2021) An improved random forest algorithm for predicting the COVID-19 pandemic patient health Annals of R.S.C.B. Sci Rep 25(1):67–75
Sharma, S, Rai S, et al. (2020) Wheat crop yield prediction using deep LSTM model. ArXiv abs/2011.01498
Srivastava AK, Safaei N et al (2022) Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Sci Rep 12(1):3215
Sun J, Shen Z (2022) Research on improved random forest algorithm for highly unbalanced data. J Phys: Conf Ser 2333(1):1–6
Wang F, Ma S et al (2018) A hybrid model integrating improved flower pollination algorithm-based feature selection and improved random forest for NOX emission estimation of coal-fired power plants. Measurement 125:303–312
Xie Y, Li X et al (2009) Customer churn prediction using improved balanced random forests. Expert Syst Appl 36:5445–5449
Xin, L (2018) An improved text classifier based on random forest algorithm - comparative studies on multiple text classifiers. In: Proceedings of the 2017 4th International Conference on Machinery, Materials and Computer (MACMC 2017), Atlantis Press 150:175–178
Xu B, Guo X et al (2012) An Improved Random Forest Classifier for Text Categorization. J Comput 7:2913–2920
Xu C, Wan J et al (2021) Prediction of prognosis and survival of patients with gastric cancer by a weighted improved random forest model: an application of machine learning in medicine. Arch Med Sci 18(5):1208–1220
Xue D, Cheng Y et al (2020) An improved random forest model applied to point cloud classification. IOP Conf Ser: Mater Sci Eng 768(7):1–6
Yang M, Zhao M et al (2021) Improved random forest method for ultra-short-term prediction of the output power of a photovoltaic cluster. Front Energy Res 9:1–12
Yu Y, Wang L et al (2020) An Improved Random Forest Algorithm. J Phys: Conf Ser 1646:1–6
Zhang Y, Luo L et al (2021) Improved random forest algorithm based on decision paths for fault diagnosis of chemical process with incomplete data. Sensors (basel) 21(20):6715
Zhu Y, Xu W et al (2020) Random Forest enhancement using improved Artificial Fish Swarm for the medial knee contact force prediction. Artif Intell Med 103:101811
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
The entire manuscript was authored and implemented solely by the first author.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Communicated by H. Babaie
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Manafifard, M. A new hyperparameter to random forest: application of remote sensing in yield prediction. Earth Sci Inform 17, 63–73 (2024). https://doi.org/10.1007/s12145-023-01156-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-023-01156-8