skip to main content
10.1145/3297858.3304058acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

FA3C: FPGA-Accelerated Deep Reinforcement Learning

Authors Info & Claims
Published:04 April 2019Publication History

ABSTRACT

Deep Reinforcement Learning (Deep RL) is applied to many areas where an agent learns how to interact with the environment to achieve a certain goal, such as video game plays and robot controls. Deep RL exploits a DNN to eliminate the need for handcrafted feature engineering that requires prior domain knowledge. The Asynchronous Advantage Actor-Critic (A3C) is one of the state-of-the-art Deep RL methods. In this paper, we present an FPGA-based A3C Deep RL platform, called FA3C. Traditionally, FPGA-based DNN accelerators have mainly focused on inference only by exploiting fixed-point arithmetic. Our platform targets both inference and training using single-precision floating-point arithmetic. We demonstrate the performance and energy efficiency of FA3C using multiple A3C agents that learn the control policies of six Atari 2600 games. Its performance is better than a high-end GPU-based platform (NVIDIA Tesla P100). FA3C achieves 27.9% better performance than that of a state-of-the-art GPU-based implementation. Moreover, the energy efficiency of FA3C is 1.62x better than that of the GPU-based implementation.

References

  1. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 265--283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Altera. 2018. Embedded Memory in Altera FPGAs. (2018). Retrieved Jan. 20, 2019 from https://www.intel.com/content/www/us/en/programmable/solutions/technology/memory/embedded.htmlGoogle ScholarGoogle Scholar
  3. Amazon Web Services, Inc. 2019. Amazon EC2 F1 Instances. (2019). Retrieved Jan. 20, 2019 from https://aws.amazon.com/ec2/instance-types/f1/Google ScholarGoogle Scholar
  4. Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, and Gordon R. Chiu. 2017. An OpenCL#8482; Deep Learning Accelerator on Arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM Press, 55--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, and Jan Kautz. 2017. Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU. In ICLR .Google ScholarGoogle Scholar
  6. Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2015. The Arcade Learning Environment: An Evaluation Platform for General Agents. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15). AAAI Press, 4148--4152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alfredo V. Clemente, Humberto Nicolá s Castejó n Mart'i nez, and Arjun Chandra. 2017. Efficient Parallel Methods for Deep Reinforcement Learning. arXiv preprint (2017). arxiv: 1705.04862Google ScholarGoogle Scholar
  8. Chrisantha Fernando, Dylan Banarse, Charles Blundell, Yori Zwols, David Ha, Andrei A. Rusu, Alexander Pritzel, and Daan Wierstra. 2017. PathNet: Evolution Channels Gradient Descent in Super Neural Networks. arXiv preprint (2017). arxiv: 1701.08734Google ScholarGoogle Scholar
  9. Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In Proceedings of the IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  10. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. 2012. Overview of mini-batch gradient descent. (2012). Retrieved Jan. 20, 2019 from http://www.cs.toronto.edu/ tijmen/csc321/slides/lecture_slides_lec6.pdfGoogle ScholarGoogle Scholar
  12. Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, and Koray Kavukcuoglu. 2016. Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv preprint (2016). arxiv: 1611.05397Google ScholarGoogle Scholar
  13. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. TABLA: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 14--26.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kosuke Miyoshi. 2016. Asynchronous deep reinforcement learning. (2016). Retrieved Aug. 7, 2018 from https://github.com/miyosuda/async_deep_reinforceGoogle ScholarGoogle Scholar
  16. Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML '16), Vol. 48. JMLR.org, 1928--1937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari With Deep Reinforcement Learning. NIPS Deep Learning Workshop .Google ScholarGoogle Scholar
  18. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  19. Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. 2015. Massively Parallel Methods for Deep Reinforcement Learning. arXiv preprint.arxiv: 1507.04296Google ScholarGoogle Scholar
  20. Eriko Nurvitadhi, Suchit Subhaschandra, Guy Boudoukh, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, and Duncan Moss. 2017. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM Press, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. NVIDIA. 2018. cuBLAS Dense Linear Algebra on GPUs. (2018). Retrieved Jan. 20, 2019 from https://developer.nvidia.com/cublasGoogle ScholarGoogle Scholar
  22. NVIDIA. 2018. NVIDIA cuDNN GPU Accelerated Deep Learning. (2018). Retrieved Jan. 20, 2019 from https://developer.nvidia.com/cudnnGoogle ScholarGoogle Scholar
  23. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS Workshop .Google ScholarGoogle Scholar
  24. Jiantao Qiu, Sen Song, Yu Wang, Huazhong Yang, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, and Ningyi Xu. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM Press, 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive Neural Networks. arXiv preprint (2016). arxiv: 1606.04671Google ScholarGoogle Scholar
  26. Herman Schmit and Randy Huang. 2016. Dissecting XeonGoogle ScholarGoogle Scholar
  27. FPGA: Why the Integration of CPUs and FPGAs Makes a Power Difference for the Datacenter: Invited Paper. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (ISLPED '16). ACM Press, 152--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). IEEE, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning : An Introduction .MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Ré mi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample Efficient Actor-Critic with Experience Replay. arXiv preprint.arxiv: 1611.01224Google ScholarGoogle Scholar
  31. Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC '17). ACM Press, Article 29, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xilinx. 2017. Block Memory Generator,. (2017). Retrieved Jan. 20, 2019 from https://www.xilinx.com/products/intellectual-property/block_memory_generator.htmlGoogle ScholarGoogle Scholar
  33. Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks. In Proceedings of the 35th International Conference on Computer-Aided Design (ICCAD '16). ACM Press, Article 12, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '15). ACM Press, 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jialiang Zhang and Jing Li. 2017. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM Press, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FA3C: FPGA-Accelerated Deep Reinforcement Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2019
        1126 pages
        ISBN:9781450362405
        DOI:10.1145/3297858

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS '19 Paper Acceptance Rate74of351submissions,21%Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader