Skip to main content

A CGRA Definition Framework for Dataflow Applications

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2020)

Abstract

Executing complex scientific applications on Coarse Grain Reconfigurable Arrays (CGRAs) promises execution time and/or energy consumption reduction compared to software execution or even customized hardware solutions. The compute core of CGRA architectures is a cell that typically consists of simple and generic hardware units, such as ALUs, simple processors, or even custom logic tailored to an application’s specific characteristics. However generality in the cell contents, while convenient for serving multiple applications, comes at the cost of execution acceleration and energy consumption.

This work proposes a novel Mixed-CGRA Definition Framework (MC-DeF) targeting a Mixed-CGRA architecture that leverages the advantages of CGRAs by utilizing a customized cell-array, and FPGAs by utilizing a separate LUT array used for adaptability. Our framework employs a custom cell structure and functionality definition phase to create highly customized application/domain specific CGRA designs. This is achieved through the use of cost functions that use metrics such a resource usage, connectivity overhead, chip area occupied, i.a., and user-defined threshold values. Thus, the framework aids the user in creating suitable designs based on the application’s needs and/or design restrictions, energy and/or area constraints.

We evaluate our framework using three applications: Hayashi-Yoshida, Mutual Information and Transfer Entropy and present fully functional, FPGA-based implementations of these applications to demonstrate the validity of our framework. Comparisons with related work show that MC-DeF performs favourably in terms of processing throughput - even when compared with much larger designs, uses fewer resources than most of the compared architectures, while utilizing better the underlying architecture recording the second best efficiency (LUT/GOPs) rating.

This research is supported in part by the General Secretariat for Research and Technology (GSRT) and the Hellenic Foundation for Research and Innovation (HFRI).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed, E., Rose, J.: The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12(3), 288–298 (2004)

    Article  Google Scholar 

  2. Alle, M., et al.: REDEFINE: runtime reconfigurable polymorphic ASIC. ACM Trans. Embed. Comput. Syst. 9(2), 11:1–11:48 (2009)

    Article  Google Scholar 

  3. Ansaloni, G., Bonzini, P., Pozzi, L.: EGRA: a coarse grained reconfigurable architectural template. IEEE Trans. Very Large Scale Integr. Syst. 19(6), 1062–1074 (2011)

    Article  Google Scholar 

  4. Chang, J., et al.: 12.1 A 7nm 256 Mb SRAM in high-k metal-gate FinFET technology with write-assist circuitry for low-VMIN applications. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 206–207, February 2017

    Google Scholar 

  5. Charitopoulos, G., Pnevmatikatos, D.N.: DARSA: a dataflow analysis tool for reconfigurable platforms. In: 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2018, pp. 65–72 (2018)

    Google Scholar 

  6. Clark, N., Zhong, H., Mahlke, S.: Processor acceleration through automated instruction set customization. In: Proceedings of 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-36, pp. 129–140 (2003)

    Google Scholar 

  7. Coole, J., Stitt, G.: Intermediate fabrics: virtual architectures for circuit portability and fast placement and routing. In: 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 13–22, October 2010

    Google Scholar 

  8. Dally, B.: Challenges for future computing systems. Presentation in HiPEAC Conference (2015)

    Google Scholar 

  9. De Sutter, B., Raghavan, P., Lambrechts, A.: Coarse-grained reconfigurable array architectures. In: Bhattacharyya, S.S., Deprettere, E.F., Leupers, R., Takala, J. (eds.) Handbook of Signal Processing Systems, pp. 427–472. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91734-4_12

    Chapter  Google Scholar 

  10. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endow. 7(7), 517–528 (2014)

    Article  Google Scholar 

  11. Govindaraju, V., et al.: DySER: unifying functionality and parallelism specialization for energy-efficient computing. IEEE Micro 32(5), 38–51 (2012)

    Article  Google Scholar 

  12. Govindaraju, V., Ho, C., Sankaralingam, K.: Dynamically specialized datapaths for energy efficient computing. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pp. 503–514, February 2011

    Google Scholar 

  13. Hartenstein, R.: Coarse grain reconfigurable architecture (embedded tutorial). In: Proceedings of the 2001 Asia and South Pacific Design Automation Conference, DAC 2001, pp. 564–570. ACM (2001)

    Google Scholar 

  14. Hayashi, T., Yoshida, N.: On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11(2), 359–379 (2005)

    Article  MathSciNet  Google Scholar 

  15. Hu, W.H., Lee, S.E., Bagherzadeh, N.: DMesh: a diagonally-linked mesh network-on-chip architecture. In: Network on Chip Architectures, p. 14 (2008)

    Google Scholar 

  16. Iordanou, K., Nikolakaki, S.M., Malakonakis, P., Dollas, A.: A performance evaluation of multi-FPGA architectures for computations of information transfer. In: 18th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2018, pp. 1–9 (2018)

    Google Scholar 

  17. Jain, A.K., Fahmy, S.A., Maskell, D.L.: Efficient overlay architecture based on DSP blocks. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 25–28, May 2015

    Google Scholar 

  18. Jain, A.K., Li, X., Singhai, P., Maskell, D.L., Fahmy, S.A.: DeCO: a DSP block based FPGA accelerator overlay with low overhead interconnect. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 1–8, May 2016

    Google Scholar 

  19. Jain, A.K., Maskell, D.L., Fahmy, S.A.: Are coarse-grained overlays ready for general purpose application acceleration on FPGAs? In: 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure Computing, 14th International Conference on Pervasive Intelligence and Computing, (DASC/PiCom/DataCom/CyberSciTech), pp. 586–593, August 2016

    Google Scholar 

  20. Keckler, S.W., Dally, W.J., Khailany, B., Garland, M., Glasco, D.: GPUs and the future of parallel computing. IEEE Micro 31(5), 7–17 (2011)

    Article  Google Scholar 

  21. Madhu, K.T., Das, S., Nalesh, S., Nandy, S.K., Narayan, R.: Compiling HPC kernels for the redefine CGRA. In: IEEE 17th International Conference on High Performance Computing and Communications, and 12th International Conference on Embedded Software and Systems, pp. 405–410, August 2015

    Google Scholar 

  22. Niedermeier, A., Kuper, J., Smit, G.J.M.: A dataflow inspired programming paradigm for coarse-grained reconfigurable arrays. In: Goehringer, D., Santambrogio, M.D., Cardoso, J.M.P., Bertels, K. (eds.) ARC 2014. LNCS, vol. 8405, pp. 275–282. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05960-0_29

    Chapter  Google Scholar 

  23. Pell, O., Averbukh, V.: Maximum performance computing with dataflow engines. Comput. Sci. Eng. 14(4), 98–103 (2012)

    Article  Google Scholar 

  24. Sen, M., et al.: Dataflow-based mapping of computer vision algorithms onto FPGAs. EURASIP J. Embedded Syst. 2007(1), 049236 (2007)

    Article  Google Scholar 

  25. Standaert, T., et al.: BEOL process integration for the 7 nm technology node. In: 2016 IEEE International Interconnect Technology Conference/Advanced Metallization Conference (IITC/AMC), pp. 2–4, May 2016

    Google Scholar 

  26. Stojilović, M., Novo, D., Saranovac, L., Brisk, P., Ienne, P.: Selective flexibility: creating domain-specific reconfigurable arrays. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(5), 681–694 (2013)

    Article  Google Scholar 

  27. Xilinx: 7 Series FPGAs Data Sheet: Overview, rev. 2.6, February 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Charitopoulos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Charitopoulos, G., Pnevmatikatos, D.N. (2020). A CGRA Definition Framework for Dataflow Applications. In: Rincón, F., Barba, J., So, H., Diniz, P., Caba, J. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2020. Lecture Notes in Computer Science(), vol 12083. Springer, Cham. https://doi.org/10.1007/978-3-030-44534-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44534-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44533-1

  • Online ISBN: 978-3-030-44534-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics