skip to main content
research-article

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

Published: 21 September 2022 Publication History

Abstract

Recently, Reinforcement Learning (RL) has shown great performance in solving sequential decision-making and control in dynamic environment problems. Despite its achievements, deploying Deep Neural Network (DNN)-based RL is expensive in terms of time and power due to the large number of episodes required to train agents with high dimensional image representations. Additionally, at the interference the large energy footprint of deep neural networks can be a major drawback. Embedded edge devices as the main platform for deploying RL applications are intrinsically resource-constrained and deploying deep neural network-based RL on them is a challenging task. As a result, reducing the number of actions taken by the RL agent to learn desired policy, along with the energy-efficient deployment of RL, is crucial. In this article, we propose Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications. E2HRL utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers. Our proposed model for RL agent is designed based on the learning hierarchical policies, which makes the network architecture more efficient for implementation on mobile devices. We evaluated our model in three different RL environments with different level of complexity. Simulation results with our analysis illustrate that hierarchical policy learning with several levels of control improves RL agents training efficiency and the agent learns the desired policy faster compared to a non-hierarchical model. This improvement is specifically more observable as the environment or the task becomes more complex with multiple objective subgoals. We tested our model with different hyperparameters to achieve the maximum reward by the RL agent while minimizing the model size, parameters, and required number of operations. E2HRL model enables efficient deployment of RL agent on resource-constraint-embedded devices with the proposed custom hardware architecture that is scalable and fully parameterized with respect to the number of input channels, filter size, and depth. The number of processing engines (PE) in the proposed hardware can vary between 1 to 8, which provides the flexibility of tradeoff of different factors such as latency, throughput, power, and energy efficiency. By performing a systematic hardware parameter analysis and design space exploration, we implemented the most energy-efficient hardware architectures of E2HRL on Xilinx Artix-7 FPGA and NVIDIA Jetson TX2. Comparing the implementation results shows Jetson TX2 boards achieve 0.1 ∼ 1.3 GOP/S/W energy efficiency while Artix-7 FPGA achieves 1.1 ∼ 11.4 GOP/S/W, which denotes 8.8× ∼ 11× better energy efficiency of E2HRL when model is implemented on FPGA. Additionally, compared to similar works our design shows better performance and energy efficiency.

References

[2]
Jacob Andreas, Dan Klein, and Sergey Levine. 2017. Modular multitask reinforcement learning with policy sketches. In Proceedings of the International Conference on Machine Learning. PMLR, 166–175.
[3]
Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
[4]
Maxime Chevalier-Boisvert. 2018. Gym-MiniWorld environment for OpenAI Gym. Retrieved from https://github.com/maximecb/gym-miniworld.
[5]
Hyungmin Cho, Pyeongseok Oh, Jiyoung Park, Wookeun Jung, and Jaejin Lee. 2019. FA3C: FPGA-accelerated deep reinforcement learning. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 499–513.
[6]
Lucileide M. D. Da Silva, Matheus F. Torquato, and Marcelo A. C. Fernandes. 2018. Parallel implementation of reinforcement learning Q-learning technique for FPGA. IEEE Access 7 (2018), 2782–2798.
[7]
William J. Dally, Yatish Turakhia, and Song Han. 2020. Domain-specific hardware accelerators. Commun. ACM 63, 7 (June 2020), 48–57.
[8]
Muhammad Mudassir Ejaz, Tong Boon Tang, and Cheng-Kai Lu. 2021. A fast learning approach for autonomous navigation using a deep reinforcement learning method. Electron. Lett. 57, 2 (2021), 50–53.
[9]
Sunil Gandhi, Tim Oates, Tinoosh Mohsenin, and Nicholas R. Waytowich. 2019. Learning behaviors from a single video demonstration using human feedback. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 1970–1972.
[10]
Gopalakrishna Hegde, Nachiappan Ramasamy, Nachiket Kapre, et al. 2016. CaffePresso: An optimized library for deep learning on embedded accelerator-based platforms. In Proceedings of the International Conference on Compliers, Architectures, and Synthesis of Embedded Systems (CASES). IEEE, 1–10.
[11]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735–1780.
[12]
Morteza Hosseini, Mohammad Ebrahimabadi, Arnab Neelim Mazumder, Houman Homayoun, and Tinoosh Mohsenin. 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. UMBC Stud. Collect. (2021).
[13]
Ali Jafari, Ashwinkumar Ganesan, Chetan Sai Kumar Thalisetty, Varun Sivasubramanian, Tim Oates, and Tinoosh Mohsenin. 2018. SensorNet: A scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans. Circ. Syst. I: Reg. Pap.99 (2018), 1–14.
[14]
Mohit Khatwani et al. 2019. A low complexity automated multi-channel EEG artifact detection using EEGNet. In Proceedings of the IEEE EMBS Conference on Neural Engineering. IEEE.
[15]
Changhyeon Kim, Sanghoon Kang, Sungpill Choi, Dongjoo Shin, Youngwoo Kim, and Hoi-Jun Yoo. 2019. An energy-efficient deep reinforcement learning accelerator with transposable PE array and experience compression. IEEE Solid-state Circ. Lett. 2, 11 (2019), 228–231.
[16]
Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sungpill Choi, Youngwoo Kim, and Hoi-Jun Yoo. 2019. A 2.1 TFLOPS/W mobile deep RL accelerator with transposable PE array and experience compression. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 136–138.
[17]
Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 2619–2624.
[18]
Amey Kulkarni. 2017. Heterogeneous and Scalable Sketch-based Framework for Big Data Acceleration on Low Power Embedded Cores. PhD Dissertation. University of Maryland, Baltimore County.
[19]
Sicheng Li et al. 2015. FPGA acceleration of recurrent neural network based language model. In Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 111–118.
[20]
Shanshan Liu, Xiaochen Tang, Farzad Niknia, Pedro Reviriego, Weiqiang Liu, Ahmed Louri, and Fabrizio Lombardi. 2021. Stochastic dividers for low latency neural networks. IEEE Trans. Circ. Syst. I: Reg. Pap. 68, 10 (2021), 4102–4115.
[21]
N. K. Manjunath, A. Shiri, M. Hosseini, B. Prakash, N. R. Waytowich, and T. Mohsenin. 2021. An energy efficient EdgeAI autoencoder accelerator for reinforcement learning. IEEE Open J. Circ. Syst. 2 (2021), 182–195. DOI:
[22]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[23]
Andrew Y. Ng et al. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. Springer, 363–372.
[24]
Bharat Prakash et al. 2020. Guiding safe reinforcement learning policies using structured language constraints. In Proceedings of the 34th AAAI Conference on Artificial Intelligence: SafeAI Workshop. AAAI.
[25]
Pedro Reviriego, Shanshan Liu, Otmar Ertl, Farzad Niknia, and Fabrizio Lombardi. 2021. Computing the similarity estimate using approximate memory. IEEE Trans. Emerg. Topics Comput. (2021).
[26]
Andrei A. Rusu et al. 2017. Sim-to-real robot learning from pixels with progressive nets. In Proceedings of the Conference on Robot Learning. 262–270.
[27]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[28]
Masoud Shahshahani and Dinesh Bhatia. 2021. Resource and performance estimation for CNN models using machine learning. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 43–48.
[29]
Masoud Shahshahani, Mohammad Sabri, Bahareh Khabbazan, and Dinesh Bhatia. 2021. An automated tool for implementing deep neural networks on FPGA. In Proceedings of the 34th International Conference on VLSI Design and 20th International Conference on Embedded Systems (VLSID). IEEE, 322–327.
[30]
Shengjia Shao, Jason Tsai, Michal Mysior, Wayne Luk, Thomas Chau, Alexander Warren, and Ben Jeppesen. 2018. Towards hardware accelerated reinforcement learning for application-specific robotic control. In Proceedings of the IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1–8.
[31]
Aidin Shiri, Arnab Neelim Mazumder, Bharat Prakash, Houman Homayoun, Nicholas R. Waytowich, and Tinoosh Mohsenin. 2021. A hardware accelerator for language guided reinforcement learning. IEEE Des. Test (2021).
[32]
Aidin Shiri, Arnab Neelim Mazumder, Bharat Prakash, Nitheesh Kumar Manjunath, Houman Homayoun, Avesta Sasan, Nicholas R. Waytowich, and Tinoosh Mohsenin. 2020. Energy-efficient hardware for language guided reinforcement learning. In Proceedings of the Great Lakes Symposium on VLSI. 131–136.
[33]
David Silver et al. 2017. Mastering the game of Go without human knowledge. Nature 550, 7676 (2017), 354–359.
[34]
Satinder Singh et al. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. J. Arti. Intell. Res. 16 (2002), 105–133.
[35]
Jiang Su, Jianxiong Liu, David B. Thomas, and Peter Y. K. Cheung. 2017. Neural network based reinforcement learning acceleration on FPGA platforms. ACM SIGARCH Comput. Archit. News 44, 4 (2017), 68–73.
[36]
Richard S. Sutton, Andrew G. Barto, et al. 1998. Introduction to Reinforcement Learning, Vol. 2. The MIT Press, Cambridge, MA.
[37]
Mineto Tsukada, Masaaki Kondo, and Hiroki Matsutani. 2020. A neural network-based on-device learning anomaly detector for edge devices. IEEE Trans. Comput. 69, 7 (2020), 1027–1044.
[38]
Eric Tzeng et al. 2020. Adapting deep visuomotor representations with weak pairwise constraints. In Algorithmic Foundations of Robotics XII. Springer, 688–703.
[39]
Ying Wang, Jie Xu, Yinhe Han, Huawei Li, and Xiaowei Li. 2016. DeepBurning: Automatic generation of FPGA-based learning accelerators for the neural network family. In Proceedings of the 53rd Annual Design Automation Conference (DAC’16). Association for Computing Machinery, New York, NY. DOI:
[40]
Hirohisa Watanabe, Mineto Tsukada, and Hiroki Matsutani. 2020. An FPGA-based on-device reinforcement learning approach using online sequential learning. arXiv preprint arXiv:2005.04646 (2020).
[41]
Hirohisa Watanabe, Mineto Tsukada, and Hiroki Matsutani. 2021. An FPGA-based on-device reinforcement learning approach using online sequential learning. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 96–103.
[42]
Ming Yang, Shige Wang, Joshua Bakita, Thanh Vu, F. Donelson Smith, James H. Anderson, and Jan-Michael Frahm. 2019. Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: Addressing an industrial challenge. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 305–317.
[43]
Chen Zhang et al. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161–170.
[44]
Guanwen Zhong, Akshat Dubey, Cheng Tan, and Tulika Mitra. 2019. Synergy: An HW/SW framework for high throughput CNNs on embedded heterogeneous SoC. ACM Trans. Embed. Comput. Syst. 18, 2 (Mar. 2019). DOI:
[45]
Yuke Zhu et al. 2017. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3357–3364.

Cited By

View all
  • (2024)Towards an international regulatory framework for AI safety: lessons from the IAEA’s nuclear safety regulationsHumanities and Social Sciences Communications10.1057/s41599-024-03017-111:1Online publication date: 12-Apr-2024
  • (2023)Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00015(26-38)Online publication date: 1-Oct-2023

Index Terms

  1. E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 5
      September 2022
      274 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3540253
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 21 September 2022
      Online AM: 24 February 2022
      Accepted: 06 November 2021
      Revised: 04 October 2021
      Received: 16 June 2021
      Published in TODAES Volume 27, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Reinforcement Learning
      2. CNN
      3. energy efficient hardware accelerator
      4. FPGA
      5. CPU

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • U.S. Army Research Laboratory

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)231
      • Downloads (Last 6 weeks)23
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Towards an international regulatory framework for AI safety: lessons from the IAEA’s nuclear safety regulationsHumanities and Social Sciences Communications10.1057/s41599-024-03017-111:1Online publication date: 12-Apr-2024
      • (2023)Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00015(26-38)Online publication date: 1-Oct-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media