Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications

Sunny, Chilankamol; Das, Satyajit; Martin, Kevin J. M.; Coussy, Philippe

doi:10.1007/978-3-031-62874-0_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14622))

Included in the following conference series:

International Workshop on Design and Architecture for Signal and Image Processing

160 Accesses

Abstract

Coarse-Grained Reconfigurable Array (CGRA) architectures are becoming increasingly popular as low-power accelerators in compute and data intensive application domains such as security, multimedia, signal processing, and machine learning. The efficiency of a CGRA is determined by its architectural features and the compiler’s ability to exploit the spatio-temporal configuration. Numerous design optimizations and mapping techniques have been introduced in this direction. However, the execution model has been overlooked, despite its critical role in ensuring the efficient acceleration of applications. Most of the existing CGRA implementations follow a hosted approach i.e., they execute the modulo scheduled innermost loop, entrusting outer loops to the host processor. This increases synchronization overhead with the host, mitigating the benefits of acceleration provided by the CGRA. In this paper, we propose a compilation flow that supports efficient standalone execution of nested loops. Experiments show that the standalone execution model leads to a maximum of $12.33\times $ and an average of $6.75\times $ performance improvement compared to the existing hosted execution model. In the proposed model, energy consumption is reduced up to $14.49\times $ compared to that of the hosted one. We also compared our results with state-of-the-art standalone execution that uses loop flattening and achieved a maximum of $4.80\times $ speed up with an average of $2.80\times $.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hardware Based Loop Optimization for CGRA Architectures

Coarse-Grained Reconfigurable Array Architectures

AEx: Automated High-Level Synthesis of Compiler Programmable Co-Processors

Article Open access 15 February 2023

References

Akbari, O., Kamal, M., Afzali-Kusha, A., Pedram, M., Shafique, M.: X-cgra: An energy-efficient approximate coarse-grained reconfigurable architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(10) (2019). https://doi.org/10.1109/TCAD.2019.2937738
Dai, L., Wang, Y., Liu, C., Li, F., Li, H., Li, X.: Reexamining cgra memory sub-system for higher memory utilization and performance. In: 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE (2022).https://doi.org/10.1109/ICCD56317.2022.00017
Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L.: Efficient mapping of cdfg onto coarse-grained reconfigurable array architectures. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE (2017).https://doi.org/10.1109/ASPDAC.2017.7858308
Das, S., Martin, K.J., Rossi, D., Coussy, P., Benini, L.: An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38(6) (2018).https://doi.org/10.1109/TCAD.2018.2834397
Dave, S., Balasubramanian, M., Shrivastava, A.: Ramp: Resource-aware mapping for cgras. In: Proceedings of the 55th Annual Design Automation Conference (2018).https://doi.org/10.1145/3195970.3196101
Dragomir, O.S., Stefanov, T., Bertels, K.: Loop unrolling and shifting for reconfigurable architectures. In: 2008 International Conference on Field Programmable Logic and Applications. IEEE (2008).https://doi.org/10.1109/FPL.2008.4629926
Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., Gürkaynak, F.K., Benini, L.: Near-threshold risc-v core with dsp extensions for scalable iot endpoint devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(10) (2017).https://doi.org/10.1109/TVLSI.2017.2654506
Hamzeh, M., Shrivastava, A., Vrudhula, S.: Epimap: Using epimorphism to map applications on cgras. In: Proceedings of the 49th Annual Design Automation Conference (2012).https://doi.org/10.1145/2228360.2228600
Hamzeh, M., Shrivastava, A., Vrudhula, S.: Regimap: Register-aware application mapping on coarse-grained reconfigurable architectures (cgras). In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488756
Lee, J., Seo, S., Lee, H., Sim, H.U.: Flattening-based mapping of imperfect loop nests for cgras. In: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis (2014).https://doi.org/10.1145/2656075.2656085
Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9(4) (1973).https://doi.org/10.1007/BF02575586
Liu, D., Yin, S., Liu, L., Wei, S.: Polyhedral model based mapping optimization of loop nests for cgras. In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488757
Liu, L., Zhu, J., Li, Z., Lu, Y., Deng, Y., Han, J., Yin, S., Wei, S.: A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Comput. Surv. 52(6) (Oct 2019).https://doi.org/10.1145/3357375
Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.s.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (2008).https://doi.org/10.1145/1454115.1454140
Pouchet, L.N., Grauer-Gray, S.: Polybench: The polyhedral benchmark suite, 2012 (2012), http://www-roc.inria.fr/~pouchet/software/polybench
Rau, B.R.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on Microarchitecture (1994).https://doi.org/10.1145/192724.192731
Rau, B.R., Schlansker, M.S., Tirumalai, P.P.: Code generation schema for modulo scheduled loops. SIGMICRO Newsl. 23(1-2), 158-169 (dec 1992).https://doi.org/10.1145/144965.145795
Tan, C., Xie, C., Li, A., Barker, K.J., Tumeo, A.: Opencgra: An open-source unified framework for modeling, testing, and evaluating cgras. In: 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE (2020).https://doi.org/10.1109/ICCD50377.2020.00070
Torng, C., Pan, P., Ou, Y., Tan, C., Batten, C.: Ultra-elastic cgras for irregular loop specialization. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE (2021).https://doi.org/10.1109/HPCA51647.2021.00042
Wijerathne, D., Li, Z., Mitra, T.: Accelerating edge ai with morpher: An integrated design, compilation and simulation framework for cgras (2023).https://doi.org/10.48550/arXiv.2309.06127
Wijerathne, D., Li, Z., Pathania, A., Mitra, T., Thiele, L.: Himap: Fast and scalable high-quality mapping on cgra via hierarchical abstraction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41(10) (2021).https://doi.org/10.1109/TCAD.2021.3132551

Download references

Author information

Authors and Affiliations

IIT Palakkad, Kerala, India
Chilankamol Sunny & Satyajit Das
Univ. Bretagne-Sud, UMR 6285, Lab-STICC, 56100, Lorient, France
Kevin J. M. Martin & Philippe Coussy

Authors

Chilankamol Sunny
View author publications
You can also search for this author in PubMed Google Scholar
Satyajit Das
View author publications
You can also search for this author in PubMed Google Scholar
Kevin J. M. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Coussy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satyajit Das .

Editor information

Editors and Affiliations

Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal
Tiago Dias
Università degli Studi di Cagliari, Cagliari, Italy
Paola Busia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sunny, C., Das, S., Martin, K.J.M., Coussy, P. (2024). Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications. In: Dias, T., Busia, P. (eds) Design and Architectures for Signal and Image Processing. DASIP 2024. Lecture Notes in Computer Science, vol 14622. Springer, Cham. https://doi.org/10.1007/978-3-031-62874-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-62874-0_7
Published: 22 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62873-3
Online ISBN: 978-3-031-62874-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications