Skip to main content
Log in

Power-Awarness in Coarse-Grained Reconfigurable Multi-Functional Architectures: a Dataflow Based Strategy

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Modern embedded systems, to accommodate different applications or functionalities over the same substrate and provide flexibility at the hardware level, are often resource redundant and, consequently, power hungry. Therefore, dedicated design frameworks are required to implement efficient runtime reconfigurable platforms. Such frameworks, to challenge this scenario, need also to offer application specific support for power management. In this work, we adopt dataflow specifications as a starting point to feature power minimization in coarse-grained reconfigurable embedded systems. The proposed flow is composed of two subsequent steps: 1) the characterization of the optimal topological system specification(s) and 2) the identification of disjointed logic regions. These latter are then used to implement clock and power gating methodologies. The validity of this model-based approach has been proved over the reconfigurable computing core of a multi-functional coprocessor for image processing applications. Results have been assessed targeting both an ASIC 90 nm technology and a 45 nm one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14

Similar content being viewed by others

Notes

  1. Such as the availability of dedicated cells and processes on the implementation stack.

  2. Xilinx boards, for example, are equipped with dedicated blocks (BUFGs), whose outputs can drive distinct regions of logic powering down different design portions (when enabled).

  3. In the worst case scenario, where all the input DPNs share the same actor, the chain will have as many element as the iterations are.

  4. A combination is a selection of all or part of a set of objects, regardless to the order in which they are selected. Given A, B and C, the complete list of possible selections of two items would be: AB, AC, and BC.

  5. The back-annotation requires performing of a training set of synthesis trials. In this paper we have used the RTL Compiler of Cadence SoC Encounter and a 90 nm CMOS technology.

  6. Only static contribute of the power consumption is considered.

  7. SoC Encounter has been used to extract the CP associated to the N different input specifications synthesized stand-alone with a 90 nm CMOS technology.

  8. Coefficients in Eq. 4 are technology dependent and have to be modeled for each target technology node by interpolating a training set of experimental results. This form derives by experiments carried out by means of the RTL Compiler (Cadence SoC Encounter), using ASIC CMOS 90 nm technology as reference, varying the number of SBoxes (from 1 to 100) and the number of bits of SBoxes data (1, 8, 16, 32, 64).

  9. With kernel contribution we mean the power consumption of the design when the considered kernel is enabled.

References

  1. Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., & Burger, D. (2011). Dark silicon and the end of multicore scaling. In Proceedings of the 38th annual international symposium on computer architecture (ISCA) (pp. 365– 376).

  2. Taylor, M.B. (2012). Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse. In ACM/EDAC/IEEE Design automation conference (DAC) (pp. 1131–1136).

  3. Herbert, S., & Marculescu, D. (2007). Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In ACM/IEEE International symposium on low power electronics and design (ISLPED) (pp. 38–43).

  4. Eyerman, S., & Eeckhout, L. (2011). Fine-grained DVFS using on-chip regulators. ACM Transactions on Architecture and Code Optimization (TACO), 8(1), 1–24.

    Article  Google Scholar 

  5. Arora, M., Manne, S., Eckert, Y., Paul, I., Jayasena, N., & Tullsen, D.M. (2014). A comparison of core power gating strategies implemented in modern hardware. In ACM SIGMETRICS / International conference on measurement and modeling of computer systems (SIGMETRICS) (pp. 559–560).

  6. Jeff, B. (2012). Advances in big.little technology for power and energy savings. In ARM White paper.

  7. Power Forward Initiative (2009). A Practical Guide to Low Power Design.

  8. Wingard, D. (2013). Noc power-management advantages. In Keynote talks at IP-soc conference and exibition.

  9. Sau, C., Raffo, L., Palumbo, F., Bezati, E., Casale-Brunet, S., & Mattavelli, M. (2014). Automated design flow for coarse-grained reconfigurable platforms: an rvc-cal multi-standard decoder use-case. In Conference on embedded computer systems: architectures, Modeling, and Simulation (SAMOS XIV), IEEE (pp. 59–66).

  10. Palumbo, F., Carta, N., Pani, D., Meloni, P., & Raffo, L. (2014). The multi-dataflow composer tool: generation of on-the-fly reconfigurable platforms. Journal of Real-Time Image Processing (JRTIP), 9(1), 233–249.

    Article  Google Scholar 

  11. Carta, N., Sau, C., Pani, D., Palumbo, F., & Raffo, L. (2013). A Coarse-Grained reconfigurable approach for Low-Power spike sorting architectures. In IEEE/EMBS International conference on neural engineering (NER) (pp. 439–442).

  12. Carta, N., Sau, C., Palumbo, F., Pani, D., & Raffo, L. (2013). A Coarse-Grained reconfigurable wavelet denoiser exploiting the Multi-Dataflow composer tool. In Conference on design and architectures for signal and image processing (DASIP) (pp. 141–148).

  13. ISO/IEC 23001-4 (2009). MPEG-part 4: Codec configuration representation.

  14. ISO/IEC 23002-4 (2010). MPEG video tech.—Part 4: Video tool library.

  15. Palumbo, F., Sau, C., & Raffo, L. (2014). Coarse-grained reconfiguration: dataflow-based power management. In IET Computers & Digital Techniques, 9(1), 36–48.

    Article  Google Scholar 

  16. Palumbo, F., Sau, C., & Raffo, L. (2014). Power-awarness in coarse-grained reconfigurable designs: a dataflow based strategy. In IEEE Workshop on signal processing systems (siPS), IEEE (pp. 1–6).

  17. Kahn, G. (1974). The semantics of a simple language for parallel programming. In International Conference on Information Processing (pp. 471–475).

  18. Lee, E.A., & Parks, T. (1995). Dataflow process networks. In Proceedings of the IEEE (pp. 773–799).

  19. Eker, J., & Janneck, J.W. (2003). Cal Language Report Specification of the Cal Actor Language. Technical report EECS Department, University of California, Berkeley.

  20. Open RVC-CAL Compiler. http://orcc.sourceforge.net/.

  21. Bezati, E., Mattavelli, M., & Janneck, J. (2013). High-Level Synthesis of dataflow programs for signal processing systems. In International symposium on image and signal processing and analysis (ISPA) (pp. 750–754).

  22. Casale-Brunet, S., Mattavelli, M., & Janneck, J. W. (2013). Turnus: a design exploration framework for dataflow system design. In International symposium on circuits and systems (ISCAS) (pp. 654–654).

  23. Carta, S.M., Pani, D., & Raffo, L. (2006). Reconfigurable coprocessor for multimedia application domain. Journal of VLSI Signal Processing Systems, 44, 135–152.

    Article  MATH  Google Scholar 

  24. Kumar, V.V., & Lach, J. (2006). Highly flexible multimode digital signal processing systems using adaptable components and controllers. EURASIP Journal on Applied Signal Processing, 14, 1–9.

    Google Scholar 

  25. Palumbo, F., Pani, D., Manca, E., Raffo, L., Mattavelli, M., & RVC, G. Roquier. (2010). A multi-decoder CAL Composer tool. In Conference on design and architectures for signal and image processing (DASIP) (pp. 144–151).

  26. Palumbo, F., Carta, N., & Raffo, L. (2011). The Multi-Dataflow Composer tool: a runtime reconfigurable HDL platform composer. In Conference on design and architectures for signal and image processing (DASIP) (pp. 178–185).

  27. Wipliez, M., Siret, N., Carta, N., Palumbo, F., & Raffo, L. (2012). Design IP faster: introducing the C˜high-level language. In IP-SOC: IP-Embedded System Conference and Exhibition.

  28. Puri, R., Stok, L., & Bhattacharya, S. (2005). Keeping hot chips cool. In Design automation conference (DAC) (pp. 285– 288).

  29. Casale-Brunet, S., Bezati, E., Alberti, C., Mattavelli, M., Amaldi, E., & Janneck, J. (2013). Partitioning and optimization of high level stream applications for multi clock domain architectures. In IEEE Workshop on Signal Processing Systems (siPS) (pp. 177–182).

  30. Ren, R., Wei, J., Martínez, E.J., González, M.G., Álvaro, C.S., & del Oso, F.P. (2014). A PMC-driven methodology for energy estimation in RVC-CAL video codec specifications. Signal Processing: Image Communication, 28(10), 1303–1314.

    Google Scholar 

  31. Schmidt, A.G., Steiner, N., French, M., & Sass, R. (2012). Hwpmi: an extensible performance monitoring infrastructure for improving hardware design and productivity on fpgas. International Journal of Reconfigurable Computing, 2012, 1–12.

    Google Scholar 

  32. Lucarz, C., Roquier, G., & Mattavelli, M. (2010). High level design space exploration of RVC codec specifications for multi-core heterogeneous platforms. In Conference on design and architectures for signal and image processing (DASIP) (pp. 191–198).

  33. Rahman, A.A.A., Thavot, R., Casale-Brunet, S., Bezati, E., & Mattavelli, M. (2012). Design space exploration strategies for FPGA implementation of signal processing systems using CAL dataflow program. In Conference on design and architectures for signal and image processing (DASIP) (pp. 1–8).

  34. Meloni, P., Pomata, S., Tuveri, G., Secchi, S., Raffo, L., & Lindwer, M. (2012). Enabling fast ASIP design space exploration: an fpga-based runtime reconfigurable prototyper. VLSI Design.

  35. Palumbo, F., Sau, C., & Raffo, L. (2013). DSE And profiling of Multi-Context Coarse-Grained reconfigurable systems. In International symposium on image and signal processing and analysis (ISPA) (pp. 744–749).

  36. Zhang, Y., Roivainen, J., & Mammela, A. (2006). Clock-gating in FPGAs: a Novel and Comparative Evaluation. In EUROMICRO Conference on conference on digital system design: architectures, Methods and Tools (DSD) (pp. 584–590).

  37. Bezati, E., Casale-Brunet, S., Mattavelli, M., & Janneck, J. W. (2014). Coarse grain clock gating of streaming applications in programmable logic implementations. In Proceedings of the 2014 electronic system level synthesis conference (ESLsyn) (pp. 1–6).

  38. Silicon Integration Initiative (2014). Si2 Common Power Format SpecificationTM - Version 2.1.

Download references

Acknowledgments

Dr. Carlo Sau is grateful to Sardinia Regional Government for funding the RPCT Project (L.R. 7/2007, CRP-18324) that led to these results. Carlo Sau and Tiziana Fanni are grateful to Sardinia Regional Government for supporting their PhD scholarship (P.O.R. F.S.E., European Social Fund 2007-2013 - Axis IV Human Resources).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesca Palumbo.

Appendix A: MDC Generated CPF

Appendix A: MDC Generated CPF

figure h

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Palumbo, F., Fanni, T., Sau, C. et al. Power-Awarness in Coarse-Grained Reconfigurable Multi-Functional Architectures: a Dataflow Based Strategy. J Sign Process Syst 87, 81–106 (2017). https://doi.org/10.1007/s11265-016-1106-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-016-1106-9

Keywords

Navigation