Skip to main content

Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications

  • Conference paper
  • First Online:
Design and Architectures for Signal and Image Processing (DASIP 2024)

Abstract

Coarse-Grained Reconfigurable Array (CGRA) architectures are becoming increasingly popular as low-power accelerators in compute and data intensive application domains such as security, multimedia, signal processing, and machine learning. The efficiency of a CGRA is determined by its architectural features and the compiler’s ability to exploit the spatio-temporal configuration. Numerous design optimizations and mapping techniques have been introduced in this direction. However, the execution model has been overlooked, despite its critical role in ensuring the efficient acceleration of applications. Most of the existing CGRA implementations follow a hosted approach i.e., they execute the modulo scheduled innermost loop, entrusting outer loops to the host processor. This increases synchronization overhead with the host, mitigating the benefits of acceleration provided by the CGRA. In this paper, we propose a compilation flow that supports efficient standalone execution of nested loops. Experiments show that the standalone execution model leads to a maximum of \(12.33\times \) and an average of \(6.75\times \) performance improvement compared to the existing hosted execution model. In the proposed model, energy consumption is reduced up to \(14.49\times \) compared to that of the hosted one. We also compared our results with state-of-the-art standalone execution that uses loop flattening and achieved a maximum of \(4.80\times \) speed up with an average of \(2.80\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akbari, O., Kamal, M., Afzali-Kusha, A., Pedram, M., Shafique, M.: X-cgra: An energy-efficient approximate coarse-grained reconfigurable architecture. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(10) (2019). https://doi.org/10.1109/TCAD.2019.2937738

  2. Dai, L., Wang, Y., Liu, C., Li, F., Li, H., Li, X.: Reexamining cgra memory sub-system for higher memory utilization and performance. In: 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE (2022).https://doi.org/10.1109/ICCD56317.2022.00017

  3. Das, S., Martin, K.J., Coussy, P., Rossi, D., Benini, L.: Efficient mapping of cdfg onto coarse-grained reconfigurable array architectures. In: 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE (2017).https://doi.org/10.1109/ASPDAC.2017.7858308

  4. Das, S., Martin, K.J., Rossi, D., Coussy, P., Benini, L.: An energy-efficient integrated programmable array accelerator and compilation flow for near-sensor ultralow power processing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38(6) (2018).https://doi.org/10.1109/TCAD.2018.2834397

  5. Dave, S., Balasubramanian, M., Shrivastava, A.: Ramp: Resource-aware mapping for cgras. In: Proceedings of the 55th Annual Design Automation Conference (2018).https://doi.org/10.1145/3195970.3196101

  6. Dragomir, O.S., Stefanov, T., Bertels, K.: Loop unrolling and shifting for reconfigurable architectures. In: 2008 International Conference on Field Programmable Logic and Applications. IEEE (2008).https://doi.org/10.1109/FPL.2008.4629926

  7. Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., Flamand, E., Gürkaynak, F.K., Benini, L.: Near-threshold risc-v core with dsp extensions for scalable iot endpoint devices. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25(10) (2017).https://doi.org/10.1109/TVLSI.2017.2654506

  8. Hamzeh, M., Shrivastava, A., Vrudhula, S.: Epimap: Using epimorphism to map applications on cgras. In: Proceedings of the 49th Annual Design Automation Conference (2012).https://doi.org/10.1145/2228360.2228600

  9. Hamzeh, M., Shrivastava, A., Vrudhula, S.: Regimap: Register-aware application mapping on coarse-grained reconfigurable architectures (cgras). In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488756

  10. Lee, J., Seo, S., Lee, H., Sim, H.U.: Flattening-based mapping of imperfect loop nests for cgras. In: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis (2014).https://doi.org/10.1145/2656075.2656085

  11. Levi, G.: A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo 9(4) (1973).https://doi.org/10.1007/BF02575586

  12. Liu, D., Yin, S., Liu, L., Wei, S.: Polyhedral model based mapping optimization of loop nests for cgras. In: Proceedings of the 50th Annual Design Automation Conference (2013).https://doi.org/10.1145/2463209.2488757

  13. Liu, L., Zhu, J., Li, Z., Lu, Y., Deng, Y., Han, J., Yin, S., Wei, S.: A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications. ACM Comput. Surv. 52(6) (Oct 2019).https://doi.org/10.1145/3357375

  14. Park, H., Fan, K., Mahlke, S.A., Oh, T., Kim, H., Kim, H.s.: Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In: Proceedings of the 17th international conference on Parallel architectures and compilation techniques (2008).https://doi.org/10.1145/1454115.1454140

  15. Pouchet, L.N., Grauer-Gray, S.: Polybench: The polyhedral benchmark suite, 2012 (2012), http://www-roc.inria.fr/~pouchet/software/polybench

  16. Rau, B.R.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: Proceedings of the 27th annual international symposium on Microarchitecture (1994).https://doi.org/10.1145/192724.192731

  17. Rau, B.R., Schlansker, M.S., Tirumalai, P.P.: Code generation schema for modulo scheduled loops. SIGMICRO Newsl. 23(1-2), 158-169 (dec 1992).https://doi.org/10.1145/144965.145795

  18. Tan, C., Xie, C., Li, A., Barker, K.J., Tumeo, A.: Opencgra: An open-source unified framework for modeling, testing, and evaluating cgras. In: 2020 IEEE 38th International Conference on Computer Design (ICCD). IEEE (2020).https://doi.org/10.1109/ICCD50377.2020.00070

  19. Torng, C., Pan, P., Ou, Y., Tan, C., Batten, C.: Ultra-elastic cgras for irregular loop specialization. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE (2021).https://doi.org/10.1109/HPCA51647.2021.00042

  20. Wijerathne, D., Li, Z., Mitra, T.: Accelerating edge ai with morpher: An integrated design, compilation and simulation framework for cgras (2023).https://doi.org/10.48550/arXiv.2309.06127

  21. Wijerathne, D., Li, Z., Pathania, A., Mitra, T., Thiele, L.: Himap: Fast and scalable high-quality mapping on cgra via hierarchical abstraction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41(10) (2021).https://doi.org/10.1109/TCAD.2021.3132551

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satyajit Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sunny, C., Das, S., Martin, K.J.M., Coussy, P. (2024). Standalone Nested Loop Acceleration on CGRAs for Signal Processing Applications. In: Dias, T., Busia, P. (eds) Design and Architectures for Signal and Image Processing. DASIP 2024. Lecture Notes in Computer Science, vol 14622. Springer, Cham. https://doi.org/10.1007/978-3-031-62874-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-62874-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-62873-3

  • Online ISBN: 978-3-031-62874-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics