ABSTRACT
Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto CGRA and modulo scheduling is commonly used to improve the execution performance. However, the actual performance using modulo scheduling is highly dependent on the mapping ability of the Data Dependency Graph (DDG) extracted from a loop. As existing approaches usually separate routing exploration of multi-cycle dependence from mapping for fast compilation, they may easily suffer from poor mapping quality. In this paper, we integrate the routing explorations into the mapping process and make it have more opportunities to find a globally optimized solution. Meanwhile, with a reduced resource graph defined, the searching space of the new mapping problem is not greatly increased. To efficiently solve the problem, we introduce graph neural network based reinforcement learning to predict a placement distribution over different resource nodes for all operations in a DDG. Using the routing connectivity as the reward signal, we optimize the parameters of neural network to find a valid mapping solution with a policy gradient method. Without much engineering and heuristic designing, our approach achieves 1.57× mapping quality, as compared to the state-of-the-art heuristic.
- Mahesh Balasubramanian and Aviral Shrivastava. 2022. PathSeeker: A Fast Mapping Algorithm for CGRAs. In 2022 Design, Automation Test in Europe Conference Exhibition (DATE). 268--273. Google ScholarCross Ref
- Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio. 2017. Neural Combinatorial Optimization with Reinforcement Learning. arXiv: Artificial Intelligence (2017).Google Scholar
- Liang Chen and Tulika Mitra. 2014. Graph minor approach for application mapping on cgras. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 7, 3 (2014), 1--25.Google ScholarDigital Library
- Shail Dave, Mahesh Balasubramanian, and Aviral Shrivastava. 2018. RAMP: resource-aware mapping for CGRAs. In the 55th Annual Design Automation Conference.Google ScholarDigital Library
- Mahdi Hamzeh and Aviral Shrivastava. 2013. REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs). In Proceedings of the 50th Annual Design Automation Conference. ACM, 18.Google ScholarDigital Library
- Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula. 2012. EPIMap: using epimorphism to map applications on CGRAs. In Proceedings of the 49th Annual Design Automation Conference. ACM, 1284--1291.Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015).Google Scholar
- Wouter Kool, Herke Van Hoof, and Max Welling. 2019. Attention, Learn to Solve Routing Problems!. In 2019 international conference on learning representations.Google Scholar
- Mingyang Kou, Jiangyuan Gu, Shaojun Wei, Hailong Yao, and Shouyi Yin. 2020. TAEM: Fast Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA. In 2020 57th ACM/IEEE Design Automation Conference (DAC). 1--6. Google ScholarCross Ref
- Dajiang Liu, Shouyi Yin, Yu Peng, Leibo Liu, and Shaojun Wei. 2014. Optimizing spatial mapping of nested loop for coarse-grained reconfigurable architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 11 (2014), 2581--2594.Google ScholarDigital Library
- Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. (2016), 50--56.Google Scholar
- Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A retargetable compiler for coarse-grained reconfigurable architectures. In Field-Programmable Technology, 2002.(FPT). Proceedings. 2002 IEEE International Conference on. IEEE, 166--173.Google Scholar
- Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeffrey Dean. 2017. Device placement optimization with reinforcement learning. (2017), 2430--2439.Google Scholar
- Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 389--402.Google ScholarDigital Library
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (2016).Google Scholar
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354.Google Scholar
- R. S. Sutton and A. G. Barto. 2018. Reinforcement Learning: An Introduction(second edition). Bradford Books.Google ScholarDigital Library
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.Google Scholar
- Christopher Torng, Peitian Pan, Yanghui Ou, Cheng Tan, and Christopher Batten. 2021. Ultra-Elastic CGRAs for Irregular Loop Specialization. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 412--425.Google Scholar
- Ronald Williams. 1992. Simple statistical gradient following algorithms for connectionnist reinforcement learning. Machine Learning (1992).Google Scholar
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems 32, 1 (2021), 4--24.Google ScholarCross Ref
- Shouyi Yin, Xianqing Yao, Dajiang Liu, Leibo Liu, and Shaojun Wei. 2016. Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures. IEEE Transactions on Very Large Scale Integration Systems 24, 5 (2016), 1895--1908.Google ScholarDigital Library
- Z. Zhao, W. Sheng, Q. Wang, W. Yin, P. Ye, J. Li, and Z. Mao. 2020. Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling. IEEE Transactions on Parallel and Distributed Systems 31, 9 (2020), 2201--2219.Google ScholarCross Ref
Index Terms
- Towards High-Quality CGRA Mapping with Graph Neural Networks and Reinforcement Learning
Recommendations
GEML: GNN-based efficient mapping method for large loop applications on CGRA
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceCoarse-grained reconfigurable architecture (CGRA) is an emerging hardware architecture, with reconfigurable Processing Elements (PEs) for executing operations efficiently and flexibly. One major challenge for current CGRA compilers is the scalability ...
MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureCoarse-grained reconfigurable architecture (CGRA) has become a promising candidate for data-intensive computing due to its flexibility and high energy efficiency. CGRA compilers map data flow graphs (DFGs) extracted from applications onto CGRAs, ...
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionCoarse-Grained Reconfigurable Architectures (CGRAs) are the promising architectures with high performance, high power- efficiency and attractions of flexibility. The computation-intensive portions of application, i.e. loops, are often implemented on ...
Comments