Abstract
We propose a Q-Learning hardware accelerator for a RISC-V platform. In particular, our work focuses on the Klessydra processor. To the best of our knowledge, this is the first work in the literature that addresses this topic. We implemented the system on an AMD-Xilinx ZedBoard development board using a small amount of hardware resources and requiring a limited dynamic power of 1.528 W. The data we obtained are compatible with the future implementation of more accelerators on the same device to enhance the capabilities of the system. Compared to a standard software version of the algorithm, our accelerator allows a speed-up of \(\times 36\) in convergence time and an energy saving of \(\times 34\). The results obtained prove how our proposed system is suitable for high-speed and low-energy applications like Edge Machine Learning and embedded IoT systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dörflinger A, Albers M, Kleinbeck B, Guan Y, Michalik H, Klink R, Blochwitz C, Nechi A, Berekovic M (2021) A comparative survey of open-source application-class risc-v processor implementations. In: Proceedings of the 18th ACM international conference on computing frontiers, pp 12–20
Ramírez C, Castelló A, Quintana-Orti ES (2022) A blis-like matrix multiplication for machine learning in the risc-v isa-based gap8 processor. J Supercomput 78(16):18051–18060
Kovačević N, Mišeljić D, Stojković A (2022) Risc-v vector processor for acceleration of machine learning algorithms. In: 2022 30th Telecommunications Forum (TELFOR). IEEE, pp 1–4
Ottavi G, Garofalo A, Tagliavini G, Conti F, Benini L, Rossi D (2020) A mixed-precision risc-v processor for extreme-edge dnn inference. In: 2020 IEEE computer society annual symposium on VLSI (ISVLSI). IEEE, pp 512–517
Ciccarella G, Giuliano R, Mazzenga F, Vatalaro F, Vizzarri A (2019) Edge cloud computing in telecommunications: case studies on performance improvement and tco saving. In: 2019 fourth international conference on fog and mobile edge computing (FMEC). IEEE, pp 113–120
Rothmann M, Porrmann M (2022) A survey of domain-specific architectures for reinforcement learning. IEEE Access 10:13753–13767
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Liu X, Diao J, Li N (2022) A fpga-based accelerator implementation for path planning using q_learning algorithm. J Phys: Conf Ser 2245. IOP Publishing
Sahoo SS, Baranwal AR, Ullah S, Kumar A (2021) Memorel: a memory-oriented optimization approach to reinforcement learning on fpga-based embedded systems. In: Proceedings of the 2021 on Great Lakes Symposium on VLSI, pp 339–346
Meng Y, Kuppannagari S, Rajat R, Srivastava A, Kannan R, Prasanna V (2020) Qtaccel: a generic fpga based design for q-table based reinforcement learning accelerators. In: 2020 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 107–114
Spanò S, Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Matta M, Nannarelli A, Re M (2019) An efficient hardware implementation of reinforcement learning: the q-learning algorithm. Ieee Access 7:186340–186351
Canese L, Cardarilli GC, Di Nunzio L, Fazzolari R, Re M, Spanó S (2022) Automatic ip core generator for fpga-based q-learning hardware accelerators. In: International conference on applications in electronics pervading industry, environment and society. Springer, Berlin, pp 242–247
Cheikh A, Sordillo S, Mastrandrea A, Menichelli F, Scotti G, Olivieri M (2021) Klessydra-t: designing vector coprocessors for multithreaded edge-computing cores. IEEE Micro 41(2):64–71
Gautschi M, Schiavone PD, Traber A, Loi I, Pullini A, Rossi D, Flamand E, Gürkaynak FK, Benini L (2017) Near-threshold risc-v core with dsp extensions for scalable iot endpoint devices. IEEE Trans Very Large Scale Integr (VLSI) Syst 25(10):2700–2713 (2017)
Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Re M, Ricci A, Spano S (2022) An fpga-based multi-agent reinforcement learning timing synchronizer. Comput Electr Eng 99:107749
Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Matta M, Re M, Spanò S (2020) An action-selection policy generator for reinforcement learning hardware accelerators. In: International conference on applications in electronics pervading industry, environment and society. Springer, Berlin, pp 267–272
Klessydra: Klessydra/pulpino-klessydra: an open-source microcontroller system based on risc-v. https://github.com/klessydra/pulpino-klessydra
Acknowledgments
The authors would like to thank Advanced Micro Devices, Inc. (AMD) for providing the FPGA software tools with the AMD-Xilinx University Program. This work is partially supported by Project ECS 0000024 Rome Technopole, CUP B83C22002820006, NRP Mission 4 Component 2 Investment 1.5, funded by the European Union—NextGenerationEU.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Angeloni, D., Canese, L., Cardarilli, G.C., Di Nunzio, L., Re, M., Spanò, S. (2024). A RISC-V Hardware Accelerator for Q-Learning Algorithm. In: Bellotti, F., et al. Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2023. Lecture Notes in Electrical Engineering, vol 1110. Springer, Cham. https://doi.org/10.1007/978-3-031-48121-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-48121-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48120-8
Online ISBN: 978-3-031-48121-5
eBook Packages: EngineeringEngineering (R0)