ABSTRACT
Deep Reinforcement Learning (DRL) is substantially resource-consuming, and it requires large-scale distributed computing-nodes to learn complicated tasks, like videogame and Go play. This work attempts to down-scale a distributed DRL system into a specialized many-core chip and achieve energy-efficient on-chip DRL. With the customized Network-on-Chip that handles the communication of on-chip data and control-signals, we proposed a Synchronous Asynchronous RL Architecture (SARLA) and the according many-core chip that completely avoids the unnecessary data duplication and synchronization activities in multi-node RL systems. In evaluation, the SARLA system achieves considerable energy-efficiency boost over the GPU-based implementations for typical DRL workloads built with OpenAI-gym.
- Mnih V, Kavukcuoglu K, Silver D, et al. "Human-level control through deep reinforcement learning," Nature, 2015, 518(7540): 529--533Google ScholarCross Ref
- Arun. Nair, et al. "Massively parallel methods for deep reinforcement learning. In ICML Deep Learning Workshop. 2015.Google Scholar
- Mnih V, Badia A P, Mirza M, et al. "Asynchronous methods for deep reinforcement learning," In Proc. ICML. New York, USA, 2016: 1928--1937Google Scholar
- Y.-H. Chen, et al. "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127--138, 2017.Google ScholarCross Ref
- N. P. Jouppi, et al., "In-datacenter performance analysis of a tensor processing unit," arXiv preprint arXiv:1704.04760, 2017Google Scholar
- Y. Chen, et al., " DaDianNao: A Machine-Learning Supercomputer," in Proc. MICRO, 2014.Google ScholarDigital Library
- Sutton R S, A G. Barto, "Reinforcement learning: an introduction," Cambridge: MIT press, 1998Google Scholar
- M. Riedmiller, "Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method," In Proc. ICML, 2005.Google Scholar
- Lange S, et al., "Autonomous reinforcement learning on raw visual input data in a real world application. In Proc. IJCNN, Australia, 2012.Google Scholar
- W. Wen, et al., "Learning structured sparsity in deep neural networks," in Proc. NIPS, 2016, pp. 2074--2082.Google Scholar
- T. Lillicrap, et al., "Continuous control with deep reinforcement learning," arXiv preprint arXiv:1509.02971, 2015.Google Scholar
- D. Kim et al., 3D-MAPS: 3D Massively Parallel Processor with Stacked Memory, In Proc. Solid-State Circuits Conference (ISSCC), pp.188--190, 2012.Google Scholar
- Hoeju Chung, et al., A 58nm 1.8V 1Gb PRAM with 6.4MB/s program BW, In Proc. Solid-State Circuits Conference (ISSCC), pp.588--590, 2011.Google Scholar
- B. C. Lee et al., Architecting Phase Change Memory as a Scalable DRAM Alternative, In Proc. International Symposium on Computer Architecture (ISCA), pp.2--12, 2009.Google ScholarDigital Library
- V. Seshadri et al., RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in Proc. International Symposium on Microarchitecture (MICRO), pp. 185--197, 2013.Google Scholar
- G. Graefe et al., B-tree indexes and CPU caches, In Proc. International Conference on Data Engineering (ICDE), 2001.Google ScholarDigital Library
- R. Horspool, Practical fast searching in strings, J. Software: Practice and Experience, vol.10, no.6, pp.501--506, 1980.Google ScholarCross Ref
- J. Chhugani, Efficient Implementation of Sorting on MultiCore SIMD CPU Architecture, In Proc. the VLDB Endowment, vol.1, no.2, pp.1313--1324, 2008.Google Scholar
- R. Ubal et al., Multi2Sim: a simulation framework for CPU-GPU computing, In Proc. Parallel architectures and compilation techniques (PACT), pp.335--344, 2012.Google Scholar
- X. Dong et al., NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Non-Volatile Memory, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol.31, no.7, pp.994--1007, 2012.Google ScholarDigital Library
- F. Ahmad et al., PUMA: Purdue MapReduce Benchmarks Suite, Technical Report, Purdue ECE Tech Report TR-ECE-12-11.Google Scholar
- M. Guthaus et al., MiBench: A free, commercially representative embedded benchmark suite, In Proc. Workload Characterization (WWC), pp.3--14, 2001.Google ScholarCross Ref
- OpenCV library; http://code.opencv.org.Google Scholar
- Pizza&Chili repository, http://pizzachili.dcc.uchile.cl/texts.htmlGoogle Scholar
- DARPA Intrusion Detection Data Sets, http://www.ll.mit.edu/mission/Google Scholar
- P. Svärd et al. Evaluation of delta compression techniques for efficient live migration of large virtual machines, in Proc. Virtual execution environments (VEE), pp.111--120, 2011.Google ScholarDigital Library
- S. Li et al., McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, In International Symposium on Microarchitecture (MICRO), pp.469--480, 2009.Google Scholar
- Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.Google Scholar
Index Terms
- A many-core accelerator design for on-chip deep reinforcement learning
Recommendations
A reconfigurable source-synchronous on-chip network for GALS many-core platforms
Special issue on the 2009 ACM/IEEE international symposium on networks-on-chipThis paper presents a globally-asynchronous locally-synchronous (GALS)-compatible circuit-switched on-chip network that is well suited for use in many-core platforms targeting streaming digital signal processing and embedded applications which typically ...
Flexible Reconfigurable On-chip Networks for Multi-core SoCs
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable TechnologiesMulti and many-core embedded SoCs (System-on-Chip) provide key solutions to meet the extraordinary demands of current and future applications. This fact becomes critical when the chip design dives to the limitation of sub-nanometer technologies that ...
Group-Agent Reinforcement Learning
Artificial Neural Networks and Machine Learning – ICANN 2023AbstractIt can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple ...
Comments