Skip to main content

Advertisement

Log in

Parallel construction of Random Forest on GPU

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

There is tremendous growth of data generated from different industries, i.e., health, agriculture, engineering, etc. Consequently, there is demand for more processing power. Compared to computer processing units, general-purpose graphics processing units (GPUs) are rapidly emerging as a promising solution to achieving high performance and energy efficiency in various computing domains. Multiple forms of parallelism and complexity in memory access have posed a challenge in developing Random Forest (RF) GPU-based algorithm. RF is a popular and robust machine learning algorithm. In this paper, coarse-grained and dynamic parallelism approaches on GPU are integrated into RF(dpRFGPU). Experiment results of dpRFGPU are compared with sequential execution of RF(seqRFCPU) and parallelised RF trees on GPU(parRFGPU). Results show an improved average speedup from 1.62 to 3.57 of parRFGPU and dpRFGPU, respectively. Acceleration is also evident when RF is configured with an average of 32 number of trees and above in both dpRFGPU and parRFGPU on low-dimensional datasets. Nonetheless, larger datasets save significant time compared to smaller datasets on GPU (dpRFGPU saves more time compared to parRFGPU). dpRFGPU approach significantly accelerated RF trees on GPU. This approach significantly optimized RF trees parallelization on GPU by reducing its training time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Data independence in RF is facilitated by bagging. In bagging, each tree is built from an independent subset of data. Each subset of data is generated by randomly sampling data from the original dataset with replacement [5].

  2. This is caused by irregular execution paths of threads in a warp.

  3. This is caused by irregular execution paths of warps in a block.

  4. Single-instruction multiple threaded is execution where a set of instructions are dispatched in a multi-threaded manner.

  5. Compute capability determines the general specifications and available features of a GPU.

  6. The selected datasets had a variety of characteristics (e.g., dimensionality and number of records) that could reduce biasness in the experiments. These datasets also worked well with the program prototypes this research developed for experiments.

References

  1. Kirk DB, Hwu WW (2010) Programming massive parallel processors. Elsevier Inc., eBook ISBN: 9780123814739

  2. Zheng R, Hu Q, Jin H (2018) GPUPerfML: a performance analytical model based on decision tree for GPU architectures. In: The Proceedings of the 20th International Conference on High Performance Computing and Communications, IEEE. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00110

  3. Senagi K, Jouandeau N (2018) Confidence in Random Forest for performance optimization. In: Bramer M, Petridis M (eds) Artificial intelligence. XXXV SGAI 2018. Lecture notes in computer science, vol 11311. Springer, Cham. https://doi.org/10.1007/978-3-030-04191-5_31

    Chapter  Google Scholar 

  4. Vouzis PD, Sahinidis NV (2011) GPU-BLAST: using graphics processors to accelerate protein sequence alignment. J Bioinf (Oxford England) 27(2):182–188. https://doi.org/10.1093/2Fbioinformatics/2Fbtq644

    Article  Google Scholar 

  5. Breiman L (2001) Random Forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  6. Zhang J, Wang H, Feng W (2017) cuBLASTP: fine-grained parallelization of protein sequence search on CPU+GPU”. In: The Proceedings of IEE/ACM Transactions on Computational Biology and Bioinformatics, vol.14(4). https://doi.org/10.1109/TCBB.2015.2489662

  7. Wang J, Rubin N, Sidelnik A, Yalamanchili S (2016) LaPerm: locality aware scheduler for dynamic parallelism on GPUs. In: The Proceeding of the ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), vol. 44(3), pp 583–595, IEEE. https://doi.org/10.1109/ISCA.2016.57

  8. Rich C, Alexandru NM (2006) An empirical comparison of supervised learning algorithms. In: ICML ’06 Proceedings of the 23rd International Conference on Machine learning, pp 161–168, ACM. https://doi.org/10.1145/1143844.1143865

  9. Manuel FD, Eva C, Senen B (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181

    MathSciNet  MATH  Google Scholar 

  10. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  11. Nawar S, Mouazen AM (2017) Comparison between Random Forests, artificial neural networks and gradient boosted machines methods of on-line vis-NIR spectroscopy measurements of soil total nitrogen and total carbon. Sensors. https://doi.org/10.3390/s17102428

    Article  Google Scholar 

  12. Lie C, Deng J, Cao K, Xiao Y, Ma L, Wang W, Ma T, Shu C (2018) A comparison of Random Forest and support vector machine approaches to predict coal spontaneous combustion in gob. ScienceDirect 239:297–311. https://doi.org/10.1016/j.fuel.2018.11.006

    Article  Google Scholar 

  13. Wen Z, He B, Ramamohanarao K, Lu S, Shi J (2018) Efficient gradient boosted decision tree training on GPUs”. In: The Proceedings of International Parallel and Distributed Processing Symposium, IEEE. https://doi.org/10.1109/IPDPS.2018.00033

  14. Daga M, Nutter M (2012) Exploiting Coarse-grained parallelism in B+ tree Searches on an APU. In: The Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis, USA, IEEE. https://doi.org/10.1109/SC.Companion.2012.40

  15. Chen J, Li K, Tang Z, Bilal K, Yu S, Weng C, Li K (2017) A parallel Random Forest algorithm for big data in a spark cloud computing environment. IEEE Tran Parallel Distrib Syst 28(4):919–933. https://doi.org/10.1109/TPDS.2016.2603511

    Article  Google Scholar 

  16. Genuer R, Poggi J, Tuleau-Malot C, Villa-Vialaneix N (2017) Random Forests for big data. Big Data Res 9:28–46. https://doi.org/10.1016/j.bdr.2017.07.003

    Article  Google Scholar 

  17. Lo WT, Chang YS, Sheu RK, Chiu CC, Yuan SM (2014) CUDT: a CUDA based decision tree algorithm. Sci World J. https://doi.org/10.1155/2014/745640

    Article  Google Scholar 

  18. Hughes C, Hughes T (2008) Professional multicore programming: design and implementation for C++ developers. Wiley Publishing, Inc,

  19. NVIDIA Corporation. CUDA Toolkit. [Online]. https://developer.nvidia.com/cuda-toolkit. Date Accessed[April 2019]

  20. Quinlan JR (1994) C4.5 programs for machine learning. Mach Learn 16:235–240

    Google Scholar 

  21. Rauber T, Rünger G (2010) Parallel programming for multicore and cluster systems. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04818-0

    Book  MATH  Google Scholar 

  22. LeBard DN, Levine BG, Mertmann P, Barr SA, Jusufi A, Sanders S, Klein ML, Panagiotopoulos AZ (2012) Self-assembly of coarse-grained ionic surfactants accelerated by graphics processing units. J Soft Matter. https://doi.org/10.1039/c1sm06787g

    Article  Google Scholar 

  23. Nickolls J, Dally WJ (2010) The GPU computing Era. IEEE Micro. https://doi.org/10.1109/MM.2010.41

    Article  Google Scholar 

  24. NVIDIA. [Online]. Available https://docs.nvidia.com/cuda/index.html. [Accessed: April 2019]

  25. Barlas G (2015) Multicore and GPU programming an integrated approach. Elsevier Inc

  26. Luo GH, Huang SK, Chang YS, Yuan SM (2013) A parallel bees algorithm implementation on GPU. Elsevier. https://doi.org/10.1016/j.sysarc.2013.09.007

  27. Nasridinov A, Lee Y, Park YH (2013) Decision tree construction on GPU: ubiquitous parallel computing approach. Springer. https://doi.org/10.1007/s00607-013-0343-z

  28. Lettich F, Lucchese C, Maria Nardini F, Orlando S, Perego R, Tonellotto N, Venturini R (2018) Parallel traversal of large ensembles of decision trees. IEEE. https://doi.org/10.1109/TPDS.2018.2860982

    Article  Google Scholar 

  29. You Y, Zhang Z, Hsieh CJ, Demmel J, Keutzer K (2019) Fast deep neural network training on distributed systems and cloud TPUs. IEEE. https://doi.org/10.1109/TPDS.2019.2913833

    Article  Google Scholar 

  30. Mahale K, Kanaskar S, Kapadnis P, Desale M, Walunj SM (2015) Acceleration of game tree search using GPGPU. In: The Proceedings of the International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE. https://doi.org/10.1109/ICGCIoT.2015.7380525

  31. Senagi K, Jouandeau N (2018) A non-deterministic strategy for searching optimal number of trees hyperparameter in Random Forest. In: Proceedings of the Federated Conference on Computer Science and Information Systems (FedCSIS), IEEE. https://doi.org/10.15439/2018F202

  32. Oshiro TP, Perez SJ, Baranauskas A (2012) How many trees in a Random Forest?. In: Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Springer, Berlin, Heidelberg, pp 154–168, 2012. https://doi.org/10.1007/978-3-642-31537-413

  33. Dua D, Taniskidou KE (2017) UCI machine learning repository. [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  34. NVIDIA Corporation: Profiler user’s guide. [online]. https://docs.nvidia.com/cuda/profiler-users-guide/#nvprof-overview. [Date Accessed: April 2019]

  35. Senagi K, Jouandeau N and Kamoni P (2017) Using parallel Random Forest classifier in predicting land suitability for crop production. Journal of Agricultural Informatics 8(3), 23–32

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kennedy Senagi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Senagi, K., Jouandeau, N. Parallel construction of Random Forest on GPU. J Supercomput 78, 10480–10500 (2022). https://doi.org/10.1007/s11227-021-04290-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04290-6

Keywords

Navigation