Abstract
Coarse-Grained Reconfigurable Array (CGRA) architectures are becoming increasingly popular as low-power accelerators in compute and data intensive application domains such as security, multimedia, signal processing, and machine learning. The efficiency of a CGRA is determined by its architectural features and the compiler’s ability to exploit the spatio-temporal configuration. Numerous design optimizations and mapping techniques have been introduced in this direction. However, the execution model has been overlooked, despite its critical role in ensuring the efficient acceleration of applications. Most of the existing CGRA implementations follow a hosted approach i.e., they execute the modulo scheduled innermost loop, entrusting outer loops to the host processor. This increases synchronization overhead with the host, mitigating the benefits of acceleration provided by the CGRA. In this paper, we propose a compilation flow that supports efficient standalone execution of nested loops. Experiments show that the standalone execution model leads to a maximum of \(12.33\times \) and an average of \(6.75\times \) performance improvement compared to the existing hosted execution model. In the proposed model, energy consumption is reduced up to \(14.49\times \) compared to that of the hosted one. We also compared our results with state-of-the-art standalone execution that uses loop flattening and achieved a maximum of \(4.80\times \) speed up with an average of \(2.80\times \).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akbari, O., Kamal, M., Afzali-Kusha, A., Pedram, M., Shafique, M.: X-cgra: An energy-efficient approximate coarse-grained reconfigurable architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(10) (2019). https://doi.org/10.1109/TCAD.2019.2937738
Dai, L., Wang, Y., Liu, C., Li, F., Li, H., Li, X.: Reexamining cgra memory sub-system for higher memory utilization and performance. In: 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE (2022).https://doi.org/10.1109/ICCD56317.2022.00017
Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L.: Efficient mapping of cdfg onto coarse-grained reconfigurable array architectures. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE (2017).https://doi.org/10.1109/ASPDAC.2017.7858308
Das, S., Martin, K.J., Rossi, D., Coussy, P., Benini, L.: An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38(6) (2018).https://doi.org/10.1109/TCAD.2018.2834397
Dave, S., Balasubramanian, M., Shrivastava, A.: Ramp: Resource-aware mapping for cgras. In: Proceedings of the 55th Annual Design Automation Conference (2018).https://doi.org/10.1145/3195970.3196101
Dragomir, O.S., Stefanov, T., Bertels, K.: Loop unrolling and shifting for reconfigurable architectures. In: 2008 International Conference on Field Programmable Logic and Applications. IEEE (2008).https://doi.org/10.1109/FPL.2008.4629926
Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., Gürkaynak, F.K., Benini, L.: Near-threshold risc-v core with dsp extensions for scalable iot endpoint devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(10) (2017).https://doi.org/10.1109/TVLSI.2017.2654506
Hamzeh, M., Shrivastava, A., Vrudhula, S.: Epimap: Using epimorphism to map applications on cgras. In: Proceedings of the 49th Annual Design Automation Conference (2012).https://doi.org/10.1145/2228360.2228600
Hamzeh, M., Shrivastava, A., Vrudhula, S.: Regimap: Register-aware application mapping on coarse-grained reconfigurable architectures (cgras). In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488756
Lee, J., Seo, S., Lee, H., Sim, H.U.: Flattening-based mapping of imperfect loop nests for cgras. In: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis (2014).https://doi.org/10.1145/2656075.2656085
Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9(4) (1973).https://doi.org/10.1007/BF02575586
Liu, D., Yin, S., Liu, L., Wei, S.: Polyhedral model based mapping optimization of loop nests for cgras. In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488757
Liu, L., Zhu, J., Li, Z., Lu, Y., Deng, Y., Han, J., Yin, S., Wei, S.: A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Comput. Surv. 52(6) (Oct 2019).https://doi.org/10.1145/3357375
Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.s.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (2008).https://doi.org/10.1145/1454115.1454140
Pouchet, L.N., Grauer-Gray, S.: Polybench: The polyhedral benchmark suite, 2012 (2012), http://www-roc.inria.fr/~pouchet/software/polybench
Rau, B.R.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on Microarchitecture (1994).https://doi.org/10.1145/192724.192731
Rau, B.R., Schlansker, M.S., Tirumalai, P.P.: Code generation schema for modulo scheduled loops. SIGMICRO Newsl. 23(1-2), 158-169 (dec 1992).https://doi.org/10.1145/144965.145795
Tan, C., Xie, C., Li, A., Barker, K.J., Tumeo, A.: Opencgra: An open-source unified framework for modeling, testing, and evaluating cgras. In: 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE (2020).https://doi.org/10.1109/ICCD50377.2020.00070
Torng, C., Pan, P., Ou, Y., Tan, C., Batten, C.: Ultra-elastic cgras for irregular loop specialization. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE (2021).https://doi.org/10.1109/HPCA51647.2021.00042
Wijerathne, D., Li, Z., Mitra, T.: Accelerating edge ai with morpher: An integrated design, compilation and simulation framework for cgras (2023).https://doi.org/10.48550/arXiv.2309.06127
Wijerathne, D., Li, Z., Pathania, A., Mitra, T., Thiele, L.: Himap: Fast and scalable high-quality mapping on cgra via hierarchical abstraction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41(10) (2021).https://doi.org/10.1109/TCAD.2021.3132551
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sunny, C., Das, S., Martin, K.J.M., Coussy, P. (2024). Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications. In: Dias, T., Busia, P. (eds) Design and Architectures for Signal and Image Processing. DASIP 2024. Lecture Notes in Computer Science, vol 14622. Springer, Cham. https://doi.org/10.1007/978-3-031-62874-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-62874-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62873-3
Online ISBN: 978-3-031-62874-0
eBook Packages: Computer ScienceComputer Science (R0)