skip to main content
10.1145/3545008.3545067acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Public Access

Accelerating Random Forest Classification on GPU and FPGA

Published: 13 January 2023 Publication History

Abstract

Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification.
In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library.

Supplementary Material

Appendix (a4-appendix.pdf)

References

[1]
P. Baldi, P. Sadowski, and D. Whiteson. 2014. Searching for exotic particles in high-energy physics with deep learning. Nature Communications 5, 1 (jul 2014). https://doi.org/10.1038/ncomms5308
[2]
Paul E Black 2020. Dads: The on-line dictionary of algorithms and data structures. NIST: Gaithersburg, MD, USA(2020).
[3]
Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. 233–244.
[4]
Chuan Cheng and Christos-Savvas Bouganis. 2013. Accelerating Random Forest training process using FPGA. In 2013 23rd International Conference on Field programmable Logic and Applications.
[5]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[6]
Michael Goldfarb, Youngjoon Jo, and Milind Kulkarni. 2013. General Transformations for GPU Execution of Tree Traversals. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis.
[7]
Trevor Hastie and Robert Tibshirani. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2 ed.). Springer.
[8]
Tero Karras. 2012. Thinking Parallel, Part II: Tree Traversal on the GPU. https://developer.nvidia.com/blog/thinking-parallel-part-ii-tree-traversal-gpu/
[9]
Vinod Kathail. 2020. Xilinx Vitis Unified Software Platform. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Stephen Neuendorffer and Lesley Shannon (Eds.).
[10]
Xiang Lin, R.D. Shawn Blanton, and Donald E. Thomas. 2017. Random Forest Architectures on FPGA for Multiple Applications. In Proceedings of the on Great Lakes Symposium on VLSI 2017.
[11]
Diego Marron, Albert Bifet, and Gianmarco De Francisci Morales. 2014. Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams. In Proceedings of the Twenty-First European Conference on Artificial Intelligence.
[12]
Xinxin Mei and Xiaowen Chu. 2017. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 72–86. https://doi.org/10.1109/TPDS.2016.2549523
[13]
Oyku Melikoglu, Oguz Ergin, Behzad Salami, Julian Pavon, Osman Unsal, and Adrian Cristal. 2019. A Novel FPGA-Based High Throughput Accelerator For Binary Search Trees. https://doi.org/10.48550/ARXIV.1912.01556
[14]
Hiroki Nakahara, Akira Jinguji, Tomonori Fujii, and Simpei Sato. 2016. An acceleration of a random forest classification using Altera SDK for OpenCL. In 2016 International Conference on Field-Programmable Technology (FPT). 289–292. https://doi.org/10.1109/FPT.2016.7929555
[15]
Oleksandr Pavlyk and Olivier Grisel. 2020. Accelerate Your scikit-learn Applications. (2020). https://medium.com/intel-analytics-software/accelerate-your-scikit-learn-applications-a06cacf44912
[16]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[17]
Sebastian Raschka, Joshua Patterson, and Corey Nolet. 2020. Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. arXiv preprint arXiv:2002.04803(2020).
[18]
J.P. Singh, C. Holt, T. Totsuka, A. Gupta, and J. Hennessy. 1995. Load Balancing and Data Locality in Adaptive Hierarchical N-Body Methods: Barnes-Hut, Fast Multipole, and Radiosity. J. Parallel and Distrib. Comput. 27, 2 (1995), 118–141. https://doi.org/10.1006/jpdc.1995.1077
[19]
Brian Van Essen, Chris Macaraeg, Maya Gokhale, and Ryan Prenger. 2012. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[20]
Zeyi Wen, Hanfeng Liu, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2020. ThunderGBM: Fast GBDTs and Random Forests on GPUs.J. Mach. Learn. Res. 21, 108 (2020), 1–5.
[21]
Hancheng Wu and Michela Becchi. 2017. An Analytical Study of Recursive Tree Traversal Patterns on Multi- and Many-Core Platforms. In 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS). 586–595. https://doi.org/10.1109/ICPADS.2017.00082

Cited By

View all

Index Terms

  1. Accelerating Random Forest Classification on GPU and FPGA

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
    August 2022
    976 pages
    ISBN:9781450397339
    DOI:10.1145/3545008
    © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 January 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. GPU
    3. random forest classification

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICPP '22
    ICPP '22: 51st International Conference on Parallel Processing
    August 29 - September 1, 2022
    Bordeaux, France

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 1,195
      Total Downloads
    • Downloads (Last 12 months)783
    • Downloads (Last 6 weeks)98
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media