research-article

RFAcc: a 3D ReRAM associative array based random forest accelerator

Authors:

Jun YangAuthors Info & Claims

ICS '19: Proceedings of the ACM International Conference on Supercomputing

Pages 473 - 483

https://doi.org/10.1145/3330345.3330387

Published: 26 June 2019 Publication History

Abstract

Random forest (RF) is a widely adopted machine learning method for solving classification and regression problems. Training a random forest demands a large number of relational comparison and data movement operations, which take long time when using modern CPUs. Accelerating random forest training using either GPUs or FPGAs achieves only modest speedups.

In this paper, we propose RFAcc, a ReRAM based accelerator, to speed up random forest training process. We first devise a 3D ReRAM based relational comparison engine, referred to as 3D-VRComp, to enable parallel in-memory value comparison. We then exploit 3D-VRComp to construct RFAcc to speedup random forest training. Finally, we propose three optimizations, i.e., unary encoding, pipeline design, and parallel tree node training, to fully utilize the accelerator resources for maximized throughput improvement. Our experimental results show that, on average, RFAcc achieves 8564 and 16850 times speedup and 6.6 × 10⁴ and 2.6 × 10⁵ times energy saving over the training on a 4.2GHz Intel Core i7 CPU and a NVIDIA GTX1080 GPU, respectively.

References

[1]

Mahdi Nazm Bojnordi and Engin Ipek. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In International Symposium on High Performance Computer Architecture.

[2]

Leo Breiman. 2001. Random forests. Machine learning (2001).

Digital Library

[3]

Geoffrey W Burr, Robert M Shelby, Severin Sidler, Carmelo Di Nolfo, Junwoo Jang, Irem Boybat, Rohit S Shenoy, Pritish Narayanan, Kumar Virwani, Emanuele U Giacometti, et al. 2015. Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element. IEEE Transactions on Electron Devices (2015).

[4]

Chuan Cheng and Christos-Savvas Bouganis. 2013. Accelerating random forest training process using FPGA. In International Conference on Field programmable Logic and Applications.

[5]

Christophe J Chevallier, Chang Hua Siau, Seow Fong Lim, Sri Rama Namala, Misako Matsuoka, Bruce L Bateman, and Darrell Rinerson. 2010. A 0.13 μm 64Mb multi-layered conductive metal-oxide memory. In International Solid-State Circuits Conference.

[6]

Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In International Symposium on Computer Architecture.

Digital Library

[7]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning (1995).

Digital Library

[8]

Howard David, Eugene Gorbatov, Ulf R Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: memory power estimation and capping. In International Symposium on Low-Power Electronics and Design.

Digital Library

[9]

Yexin Deng, Hong-Yu Chen, Bin Gao, Shimeng Yu, Shih-Chieh Wu, Liang Zhao, Bing Chen, Zizhen Jiang, Xiaoyan Liu, Tuo-Hung Hou, et al. 2013. Design and optimization methodology for 3D RRAM arrays. In International Electron Devices Meeting.

[10]

Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P Jouppi. 2012. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2012).

[11]

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research (2014).

Digital Library

[12]

Håkan Grahn, Niklas Lavesson, Mikael Hellborg Lapajne, and Daniel Slat. 2011. CudaRF: a CUDA-based implementation of random forests. In IEEE/ACS International Conference on Computer Systems and Applications.

Digital Library

[13]

Li-Yue Huang, Meng-Fan Chang, Ching-Hao Chuang, Chia-Chen Kuo, Chien-Fu Chen, Geng-Hau Yang, Hsiang-Jen Tsai, Tien-Fu Chen, Shyh-Shyuan Sheu, Keng-Li Su, et al. 2014. ReRAM-based 4T2R nonvolatile TCAM with 7x NVM-stress reduction, and 4x improvement in speed-wordlength-capacity for normally-off instant-on filter-based search engines used in big-data processing. In Symposium on VLSI Circuits Digest of Technical Papers.

[14]

Wenqin Huangfu, Shuangchen Li, Xing Hu, and Yuan Xie. 2018. RADAR: a 3D-reRAM based DNA alignment accelerator architecture. In Design Automation Conference.

Digital Library

[15]

Kaggle. 2019. Kaggle Competitions. https://www.kaggle.com/. (2019).

[16]

Subhash Kak. 2016. Generalized unary coding. Circuits, Systems, and Signal Processing (2016).

Digital Library

[17]

Wang Kang, Haotian Wang, Zhaohao Wang, Youguang Zhang, and Weisheng Zhao. 2017. In-memory processing paradigm for bitwise logic operations in STT-MRAM. IEEE Transactions on Magnetics (2017).

[18]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE (1998).

[19]

Shuangchen Li, Liu Liu, Peng Gu, Cong Xu, and Yuan Xie. 2016. Nvsimcam: a circuit-level simulator for emerging nonvolatile memory based content-addressable memory. In International Conference on Computer-Aided Design.

Digital Library

[20]

Yisheng Liao, Alex Rubinsteyn, Russell Power, and Jinyang Li. 2013. Learning random forests on the GPU. New York University, Department of Computer Science (2013).

[21]

M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. (2013).

[22]

Nvidia. 2019. Nvidia system management interface. https://developer.nvidia.com/nvidia-system-management-interface. (2019).

[23]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research (2011).

Digital Library

[24]

Robert E Schapire. 1990. The strength of weak learnability. Machine learning (1990).

Digital Library

[25]

Hannes Schulz, Benedikt Waldvogel, Rasha Sheikh, and Sven Behnke. 2015. CURFIL: Random Forests for Image Labeling on GPU. In International Conference on Computer Vision, Theory and Applications.

[26]

Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In International Symposium on Computer Architecture.

Digital Library

[27]

Mrigank Sharad, Charles Augustine, Georgios Panagopoulos, and Kaushik Roy. 2012. Spin-based neuron model with domain-wall magnets as synapse. IEEE Transactions on Nanotechnology (2012).

Digital Library

[28]

Cong Xu, Dimin Niu, Shimeng Yu, and Yuan Xie. 2014. Modeling and design analysis of 3D vertical resistive memory - A low cost cross-point architecture. In Asia and South Pacific design automation conference.

[29]

He Zhao, Graham J Williams, and Joshua Zhexue Huang. 2017. wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests. Journal of Statistical Software (2017).

[30]

Ji Feng Zhi-Hua Zhou. 2017. Deep Forest: Towards An Alternative to Deep Neural Networks. In International Joint Conference on Artificial Intelligence.

Cited By

Zhang YZheng XXu WLiu H(2023)RT-Blink: A Method Toward Real-Time Blink Detection From Single Frontal EEG SignalIEEE Sensors Journal10.1109/JSEN.2022.323217623:3(2794-2802)Online publication date: 1-Feb-2023
https://doi.org/10.1109/JSEN.2022.3232176
Tsai CWu CChang YHu HLee YLi HKuo T(2023)A digital 3D TCAM accelerator for the inference phase of Random Forest2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247695(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247695
Liang YChen TChang YHuang YShih W(2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
https://doi.org/10.1109/TCAD.2021.3126680
Show More Cited By

Recommendations

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
ISCA'16

Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access ...
ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars
ISCA'16

A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-...
RADAR: a 3D-reRAM based DNA alignment accelerator architecture
DAC '18: Proceedings of the 55th Annual Design Automation Conference

Next Generation Sequencing (NGS) technology has become an indispensable tool for studying genomics, resulting in an exponentially growth of biological data. Booming data volume demands significant computational resources and creates challenges for '...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '19: Proceedings of the ACM International Conference on Supercomputing

June 2019

533 pages

ISBN:9781450360791

DOI:10.1145/3330345

General Chair:
Rudolf Eigenmann
University of Delaware
,
Program Chairs:
Chen Ding
University of Rochester
,
Sally A. McKee
Clemson University

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '19

Sponsor:

SIGARCH

ICS '19: 2019 International Conference on Supercomputing

June 26 - 28, 2019

Arizona, Phoenix

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
289
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YZheng XXu WLiu H(2023)RT-Blink: A Method Toward Real-Time Blink Detection From Single Frontal EEG SignalIEEE Sensors Journal10.1109/JSEN.2022.323217623:3(2794-2802)Online publication date: 1-Feb-2023
https://doi.org/10.1109/JSEN.2022.3232176
Tsai CWu CChang YHu HLee YLi HKuo T(2023)A digital 3D TCAM accelerator for the inference phase of Random Forest2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247695(1-6)Online publication date: 9-Jul-2023
https://doi.org/10.1109/DAC56929.2023.10247695
Liang YChen TChang YHuang YShih W(2022)Planting Fast-Growing Forest by Leveraging the Asymmetric Read/Write Latency of NVRAM-Based SystemsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.312668041:10(3304-3317)Online publication date: Oct-2022
https://doi.org/10.1109/TCAD.2021.3126680
Li HJin HZheng LHuang YLiao X(2022)ReCSA: a dedicated sort accelerator using ReRAM-based content addressable memoryFrontiers of Computer Science10.1007/s11704-022-1322-317:2Online publication date: 8-Aug-2022
https://doi.org/10.1007/s11704-022-1322-3
Madhyastha MLillaney KBrowne JVogelstein JBurns RZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)BLOCKSET (Block-Aligned Serialized Trees)Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467368(1170-1179)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467368
Lien YChen YHuang P(2021)Enabling Efficient Random Data Insertion/Deletion on Block-based File SystemsIEEE Transactions on Computers10.1109/TC.2021.3092178(1-1)Online publication date: 2021
https://doi.org/10.1109/TC.2021.3092178
Li HJin HZheng LLiao X(2020)ReSQM: Accelerating Database Operations Using ReRAM-Based Content Addressable MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301286039:11(4030-4041)Online publication date: Nov-2020
https://doi.org/10.1109/TCAD.2020.3012860

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten