Abstract
There is tremendous growth of data generated from different industries, i.e., health, agriculture, engineering, etc. Consequently, there is demand for more processing power. Compared to computer processing units, general-purpose graphics processing units (GPUs) are rapidly emerging as a promising solution to achieving high performance and energy efficiency in various computing domains. Multiple forms of parallelism and complexity in memory access have posed a challenge in developing Random Forest (RF) GPU-based algorithm. RF is a popular and robust machine learning algorithm. In this paper, coarse-grained and dynamic parallelism approaches on GPU are integrated into RF(dpRFGPU). Experiment results of dpRFGPU are compared with sequential execution of RF(seqRFCPU) and parallelised RF trees on GPU(parRFGPU). Results show an improved average speedup from 1.62 to 3.57 of parRFGPU and dpRFGPU, respectively. Acceleration is also evident when RF is configured with an average of 32 number of trees and above in both dpRFGPU and parRFGPU on low-dimensional datasets. Nonetheless, larger datasets save significant time compared to smaller datasets on GPU (dpRFGPU saves more time compared to parRFGPU). dpRFGPU approach significantly accelerated RF trees on GPU. This approach significantly optimized RF trees parallelization on GPU by reducing its training time.









Similar content being viewed by others
Notes
Data independence in RF is facilitated by bagging. In bagging, each tree is built from an independent subset of data. Each subset of data is generated by randomly sampling data from the original dataset with replacement [5].
This is caused by irregular execution paths of threads in a warp.
This is caused by irregular execution paths of warps in a block.
Single-instruction multiple threaded is execution where a set of instructions are dispatched in a multi-threaded manner.
Compute capability determines the general specifications and available features of a GPU.
The selected datasets had a variety of characteristics (e.g., dimensionality and number of records) that could reduce biasness in the experiments. These datasets also worked well with the program prototypes this research developed for experiments.
References
Kirk DB, Hwu WW (2010) Programming massive parallel processors. Elsevier Inc., eBook ISBN: 9780123814739
Zheng R, Hu Q, Jin H (2018) GPUPerfML: a performance analytical model based on decision tree for GPU architectures. In: The Proceedings of the 20th International Conference on High Performance Computing and Communications, IEEE. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00110
Senagi K, Jouandeau N (2018) Confidence in Random Forest for performance optimization. In: Bramer M, Petridis M (eds) Artificial intelligence. XXXV SGAI 2018. Lecture notes in computer science, vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_31
Vouzis PD, Sahinidis NV (2011) GPU-BLAST: using graphics processors to accelerate protein sequence alignment. J Bioinf (Oxford England) 27(2):182–188. https://doi.org/10.1093/2Fbioinformatics/2Fbtq644
Breiman L (2001) Random Forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Zhang J, Wang H, Feng W (2017) cuBLASTP: fine-grained parallelization of protein sequence search on CPU+GPU”. In: The Proceedings of IEE/ACM Transactions on Computational Biology and Bioinformatics, vol.14(4). https://doi.org/10.1109/TCBB.2015.2489662
Wang J, Rubin N, Sidelnik A, Yalamanchili S (2016) LaPerm: locality aware scheduler for dynamic parallelism on GPUs. In: The Proceeding of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), vol. 44(3), pp 583–595, IEEE. https://doi.org/10.1109/ISCA.2016.57
Rich C, Alexandru NM (2006) An empirical comparison of supervised learning algorithms. In: ICML ’06 Proceedings of the 23rd International Conference on Machine learning, pp 161–168, ACM. https://doi.org/10.1145/1143844.1143865
Manuel FD, Eva C, Senen B (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Nawar S, Mouazen AM (2017) Comparison between Random Forests, artificial neural networks and gradient boosted machines methods of on-line vis-NIR spectroscopy measurements of soil total nitrogen and total carbon. Sensors. https://doi.org/10.3390/s17102428
Lie C, Deng J, Cao K, Xiao Y, Ma L, Wang W, Ma T, Shu C (2018) A comparison of Random Forest and support vector machine approaches to predict coal spontaneous combustion in gob. ScienceDirect 239:297–311. https://doi.org/10.1016/j.fuel.2018.11.006
Wen Z, He B, Ramamohanarao K, Lu S, Shi J (2018) Efficient gradient boosted decision tree training on GPUs”. In: The Proceedings of International Parallel and Distributed Processing Symposium, IEEE. https://doi.org/10.1109/IPDPS.2018.00033
Daga M, Nutter M (2012) Exploiting Coarse-grained parallelism in B+ tree Searches on an APU. In: The Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis, USA, IEEE. https://doi.org/10.1109/SC.Companion.2012.40
Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K (2017) A parallel Random Forest algorithm for big data in a spark cloud computing environment. IEEE Tran Parallel Distrib Syst 28(4):919–933. https://doi.org/10.1109/TPDS.2016.2603511
Genuer R, Poggi J, Tuleau-Malot C, Villa-Vialaneix N (2017) Random Forests for big data. Big Data Res 9:28–46. https://doi.org/10.1016/j.bdr.2017.07.003
Lo WT, Chang YS, Sheu RK, Chiu CC, Yuan SM (2014) CUDT: a CUDA based decision tree algorithm. Sci World J. https://doi.org/10.1155/2014/745640
Hughes C, Hughes T (2008) Professional multicore programming: design and implementation for C++ developers. Wiley Publishing, Inc,
NVIDIA Corporation. CUDA Toolkit. [Online]. https://developer.nvidia.com/cuda-toolkit. Date Accessed[April 2019]
Quinlan JR (1994) C4.5 programs for machine learning. Mach Learn 16:235–240
Rauber T, Rünger G (2010) Parallel programming for multicore and cluster systems. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04818-0
LeBard DN, Levine BG, Mertmann P, Barr SA, Jusufi A, Sanders S, Klein ML, Panagiotopoulos AZ (2012) Self-assembly of coarse-grained ionic surfactants accelerated by graphics processing units. J Soft Matter. https://doi.org/10.1039/c1sm06787g
Nickolls J, Dally WJ (2010) The GPU computing Era. IEEE Micro. https://doi.org/10.1109/MM.2010.41
NVIDIA. [Online]. Available https://docs.nvidia.com/cuda/index.html. [Accessed: April 2019]
Barlas G (2015) Multicore and GPU programming an integrated approach. Elsevier Inc
Luo GH, Huang SK, Chang YS, Yuan SM (2013) A parallel bees algorithm implementation on GPU. Elsevier. https://doi.org/10.1016/j.sysarc.2013.09.007
Nasridinov A, Lee Y, Park YH (2013) Decision tree construction on GPU: ubiquitous parallel computing approach. Springer. https://doi.org/10.1007/s00607-013-0343-z
Lettich F, Lucchese C, Maria Nardini F, Orlando S, Perego R, Tonellotto N, Venturini R (2018) Parallel traversal of large ensembles of decision trees. IEEE. https://doi.org/10.1109/TPDS.2018.2860982
You Y, Zhang Z, Hsieh CJ, Demmel J, Keutzer K (2019) Fast deep neural network training on distributed systems and cloud TPUs. IEEE. https://doi.org/10.1109/TPDS.2019.2913833
Mahale K, Kanaskar S, Kapadnis P, Desale M, Walunj SM (2015) Acceleration of game tree search using GPGPU. In: The Proceedings of the International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE. https://doi.org/10.1109/ICGCIoT.2015.7380525
Senagi K, Jouandeau N (2018) A non-deterministic strategy for searching optimal number of trees hyperparameter in Random Forest. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE. https://doi.org/10.15439/2018F202
Oshiro TP, Perez SJ, Baranauskas A (2012) How many trees in a Random Forest?. In: Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, Berlin, Heidelberg, pp 154–168, 2012. https://doi.org/10.1007/978-3-642-31537-413
Dua D, Taniskidou KE (2017) UCI machine learning repository. [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science
NVIDIA Corporation: Profiler user’s guide. [online]. https://docs.nvidia.com/cuda/profiler-users-guide/#nvprof-overview. [Date Accessed: April 2019]
Senagi K, Jouandeau N and Kamoni P (2017) Using parallel Random Forest classifier in predicting land suitability for crop production. Journal of Agricultural Informatics 8(3), 23–32
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Senagi, K., Jouandeau, N. Parallel construction of Random Forest on GPU. J Supercomput 78, 10480–10500 (2022). https://doi.org/10.1007/s11227-021-04290-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04290-6