skip to main content
10.1145/3330345.3330387acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

RFAcc: a 3D ReRAM associative array based random forest accelerator

Published: 26 June 2019 Publication History

Abstract

Random forest (RF) is a widely adopted machine learning method for solving classification and regression problems. Training a random forest demands a large number of relational comparison and data movement operations, which take long time when using modern CPUs. Accelerating random forest training using either GPUs or FPGAs achieves only modest speedups.
In this paper, we propose RFAcc, a ReRAM based accelerator, to speed up random forest training process. We first devise a 3D ReRAM based relational comparison engine, referred to as 3D-VRComp, to enable parallel in-memory value comparison. We then exploit 3D-VRComp to construct RFAcc to speedup random forest training. Finally, we propose three optimizations, i.e., unary encoding, pipeline design, and parallel tree node training, to fully utilize the accelerator resources for maximized throughput improvement. Our experimental results show that, on average, RFAcc achieves 8564 and 16850 times speedup and 6.6 × 104 and 2.6 × 105 times energy saving over the training on a 4.2GHz Intel Core i7 CPU and a NVIDIA GTX1080 GPU, respectively.

References

[1]
Mahdi Nazm Bojnordi and Engin Ipek. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In International Symposium on High Performance Computer Architecture.
[2]
Leo Breiman. 2001. Random forests. Machine learning (2001).
[3]
Geoffrey W Burr, Robert M Shelby, Severin Sidler, Carmelo Di Nolfo, Junwoo Jang, Irem Boybat, Rohit S Shenoy, Pritish Narayanan, Kumar Virwani, Emanuele U Giacometti, et al. 2015. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Transactions on Electron Devices (2015).
[4]
Chuan Cheng and Christos-Savvas Bouganis. 2013. Accelerating random forest training process using FPGA. In International Conference on Field programmable Logic and Applications.
[5]
Christophe J Chevallier, Chang Hua Siau, Seow Fong Lim, Sri Rama Namala, Misako Matsuoka, Bruce L Bateman, and Darrell Rinerson. 2010. A 0.13 μm 64Mb multi-layered conductive metal-oxide memory. In International Solid-State Circuits Conference.
[6]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In International Symposium on Computer Architecture.
[7]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning (1995).
[8]
Howard David, Eugene Gorbatov, Ulf R Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: memory power estimation and capping. In International Symposium on Low-Power Electronics and Design.
[9]
Yexin Deng, Hong-Yu Chen, Bin Gao, Shimeng Yu, Shih-Chieh Wu, Liang Zhao, Bing Chen, Zizhen Jiang, Xiaoyan Liu, Tuo-Hung Hou, et al. 2013. Design and optimization methodology for 3D RRAM arrays. In International Electron Devices Meeting.
[10]
Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2012).
[11]
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research (2014).
[12]
Håkan Grahn, Niklas Lavesson, Mikael Hellborg Lapajne, and Daniel Slat. 2011. CudaRF: a CUDA-based implementation of random forests. In IEEE/ACS International Conference on Computer Systems and Applications.
[13]
Li-Yue Huang, Meng-Fan Chang, Ching-Hao Chuang, Chia-Chen Kuo, Chien-Fu Chen, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Keng-Li Su, et al. 2014. ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing. In Symposium on VLSI Circuits Digest of Technical Papers.
[14]
Wenqin Huangfu, Shuangchen Li, Xing Hu, and Yuan Xie. 2018. RADAR: a 3D-reRAM based DNA alignment accelerator architecture. In Design Automation Conference.
[15]
Kaggle. 2019. Kaggle Competitions. https://www.kaggle.com/. (2019).
[16]
Subhash Kak. 2016. Generalized unary coding. Circuits, Systems, and Signal Processing (2016).
[17]
Wang Kang, Haotian Wang, Zhaohao Wang, Youguang Zhang, and Weisheng Zhao. 2017. In-memory processing paradigm for bitwise logic operations in STT-MRAM. IEEE Transactions on Magnetics (2017).
[18]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE (1998).
[19]
Shuangchen Li, Liu Liu, Peng Gu, Cong Xu, and Yuan Xie. 2016. Nvsimcam: a circuit-level simulator for emerging nonvolatile memory based content-addressable memory. In International Conference on Computer-Aided Design.
[20]
Yisheng Liao, Alex Rubinsteyn, Russell Power, and Jinyang Li. 2013. Learning random forests on the GPU. New York University, Department of Computer Science (2013).
[21]
M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. (2013).
[22]
Nvidia. 2019. Nvidia system management interface. https://developer.nvidia.com/nvidia-system-management-interface. (2019).
[23]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011).
[24]
Robert E Schapire. 1990. The strength of weak learnability. Machine learning (1990).
[25]
Hannes Schulz, Benedikt Waldvogel, Rasha Sheikh, and Sven Behnke. 2015. CURFIL: Random Forests for Image Labeling on GPU. In International Conference on Computer Vision, Theory and Applications.
[26]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In International Symposium on Computer Architecture.
[27]
Mrigank Sharad, Charles Augustine, Georgios Panagopoulos, and Kaushik Roy. 2012. Spin-based neuron model with domain-wall magnets as synapse. IEEE Transactions on Nanotechnology (2012).
[28]
Cong Xu, Dimin Niu, Shimeng Yu, and Yuan Xie. 2014. Modeling and design analysis of 3D vertical resistive memory - A low cost cross-point architecture. In Asia and South Pacific design automation conference.
[29]
He Zhao, Graham J Williams, and Joshua Zhexue Huang. 2017. wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests. Journal of Statistical Software (2017).
[30]
Ji Feng Zhi-Hua Zhou. 2017. Deep Forest: Towards An Alternative to Deep Neural Networks. In International Joint Conference on Artificial Intelligence.

Cited By

View all
  • (2023)RT-Blink: A Method Toward Real-Time Blink Detection From Single Frontal EEG SignalIEEE Sensors Journal10.1109/JSEN.2022.323217623:3(2794-2802)Online publication date: 1-Feb-2023
  • (2023)A digital 3D TCAM accelerator for the inference phase of Random Forest2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247695(1-6)Online publication date: 9-Jul-2023
  • (2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '19: Proceedings of the ACM International Conference on Supercomputing
June 2019
533 pages
ISBN:9781450360791
DOI:10.1145/3330345
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ReRAM
  2. accelerator
  3. random forest

Qualifiers

  • Research-article

Conference

ICS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)RT-Blink: A Method Toward Real-Time Blink Detection From Single Frontal EEG SignalIEEE Sensors Journal10.1109/JSEN.2022.323217623:3(2794-2802)Online publication date: 1-Feb-2023
  • (2023)A digital 3D TCAM accelerator for the inference phase of Random Forest2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247695(1-6)Online publication date: 9-Jul-2023
  • (2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
  • (2022)ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memoryFrontiers of Computer Science10.1007/s11704-022-1322-317:2Online publication date: 8-Aug-2022
  • (2021)BLOCKSET (Block-Aligned Serialized Trees)Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467368(1170-1179)Online publication date: 14-Aug-2021
  • (2021)Enabling Efficient Random Data Insertion/Deletion on Block-based File SystemsIEEE Transactions on Computers10.1109/TC.2021.3092178(1-1)Online publication date: 2021
  • (2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media