Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded Systems

Artes, Antonio; Fasthuber, Robert; Ayala, Jose L.; Raghavan, Praveen; Catthoor, Francky

doi:10.1007/s11265-013-0749-z

Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded Systems

Published: 11 May 2013

Volume 72, pages 69–85, (2013)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Antonio Artes¹,
Robert Fasthuber²,
Jose L. Ayala¹,
Praveen Raghavan² &
…
Francky Catthoor²

305 Accesses
Explore all metrics

Abstract

The use of distributed loop buffer architectures with incompatible loop-nest organisations allows the execution of incompatible loops in parallel with minimal hardware overhead. Due to this fact, the utilisation of these distributed and scalable architectures in embedded systems is a promising option to improve the energy efficiency of the instruction memory organisations that exist in these systems. This paper proposes and analyses non-overlapping and complementary implementation options for distinct partitions of the design space that is related to distributed loop buffer architectures. The high-level trade-off analysis of the proposed implementations is crucial to present the correct process design that an embedded systems designer has to follow in order to have an efficient distributed loop buffer architecture for a certain application. Results show that, with an increase of about 6.5 % in the energy consumption of the control logic that exists in the instruction memory organisation, the overall energy consumption of the instruction memory organisation can be reduced by 6 % to 22 %, when distributed loop buffer architectures with incompatible loop-nest organisations are used instead of clustered loop buffer architectures with shared loop-nest organisations architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

A Non-Stop Double Buffering Mechanism for Dataflow Architecture

Article 26 January 2018

NoC-based hardware software co-design framework for dataflow thread management

Article Open access 11 May 2023

References

Bajwa, R.S., Hiraki, M., Kojima, H., Gorny, D.J., Nitta, K., Shridhar, A., et al. (1997). Instruction buffering to reduce power in processors for signal processing. Journal of IEEE Transactions on VLSI Systems, 5(4), 417–424.
Article Google Scholar
Banakar, R., Steinke, S., Bo-Sik, L., Balakrishnan, M., Marwedel, P. (2002). Scratchpad memory: a design alternative for cache on-chip memory in embedded systems. Proceedings of the tenth international symposium on hardware/software codesign (pp. 73–78).
Benini, L., Macii, A., Poncino, M. (2000). A recursive algorithm for low-power memory partitioning. Proceedings of the 2000 international symposium on low power electronics and design (pp. 78–83).
Catthoor, F., Raghavan, P., Lambrechts, A., Jayapala, M., Kritikakou, A., Absar, J. (2010). Ultra-low energy domain-specific instruction-set processors. Berlin: Springer
Book Google Scholar
Chalasani, S., & Conrad, J.M. (2008). A survey of energy harvesting sources for embedded systems. In Proceedings of IEEE Southeast conference (pp. 442–447).
De Man, H. (2005). Ambient intelligence: gigascale dreams and nanoscale realities. IEEE international solid-state circuits conference, 1, 29–35.
Google Scholar
Gomez, J.I., Marchal, P., Verdoorlaege, S., Pinuel, L., Catthoor, F. (2004). Optimizing the memory bandwidth with loop morphing. In Proceedings of the 15th IEEE international conference on application-specific systems, architectures, and processors (pp. 213–223).
Jayapala, M., Barat, F., Op De Beeck, P., Lauwereins, R., Catthoor, F., Deconinck, G. (2001). Low energy clustered instruction fetch and split loop cache architecture for long instruction word processors. In Proceedings of the workshop on compilers and operating systems for low power (pp. 1–8).
Jayapala, M., Barat, F., Vander Aa, T., Catthoor, F., Corporaal, H., Deconinck, G. (2005). Clustered loop buffer organization for low energy VLIW embedded processors. IEEE Transactions on Computers, 54(6), 672–683.
Article Google Scholar
Kandemir, M., Kolcu, I., Kadayif, I. (2002). Influence of loop optimizations on energy consumption of multi-bank memory systems. Proceedings of the 11th international conference on compiler construction (pp. 276–292).
Kandemir, M., Kadayif, I., Choudhary, A., Ramanujam, J., Kolcu, I. (2004). Compiler-directed scratch pad memory optimization for embedded multiprocessors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(3), 281–287.
Article Google Scholar
Kin, J., Gupta, M., Mangione-Smith, W.H. (1997). The filter cache: an energy efficient memory structure. In Proceedings of international symposium on microarchitecture (pp. 184-193).
Lin, T., Pengwei, H., Shufang, X. (2005). Factoring m-band wavelet transforms into reversible integer mappings and lifting steps. IEEE International Conference on Acoustics, Speech, and Signal Processing, 4, 629–632.
Google Scholar
Lyuh, C., & Taewhan, K. (2004). Memory access scheduling and binding considering energy minimization in multi-bank memory systems. In Proceedings of the conference on design automation and test in Europe (pp. 81–86).
Psychou, G., Fasthuber, R., Catthoor, F., Hulzink, J., Huisken, J. (2012). Sub-word handling in data-parallel mapping. ARCS workshops (pp. 1–7).
Shuren, Q., & Zhong, J. (2004). Multi-resolution time-frequency analysis for detection of rhythms of EEG signals. Digital signal processing workshop (pp. 338-341).
Raghavan, P., Lambrechts, A., Jayapala, M., Catthoor, F., Verkest, D. (2006). Distributed loop controller architecture for multi-threading in uni-threaded VLIW Processors. In Proceedings of the design automation, and test in Europe (pp. 1–6).
Tsekoura, I., Selimis, G., Hulzink, J., Catthoor, F., Huisken, J., de Groot, H., et al. (2010). Exploration of cryptographic ASIP designs for wireless sensor nodes. 17th IEEE international conference on electronics circuits and systems (ICECS) (pp. 827–830).
Verma, M., & Marwedel, P. (2007). Advanced memory optimization techniques for low-power embedded processors. Berlin: Springer.
Google Scholar
Villarreal, J., Lysecky, R., Cotterell, S., Vahid, F. (2001). A study on the loop behavior of embedded programs. UCR-CSE-01-03. University of California, Riverside.
Vivekanandarajah, K., Srikanthan, T., Bhattacharyya, S. (2004). Dynamic filter cache for low power instruction memory hierarchy. In Proceedings of the euromicro symposium on digital system design (pp. 607–610).
Xiaobo, F., Ellis, C.S., Lebeck, A.R. (2001). Memory controller policies for DRAM power management. International Symposium on Low Power Electronics and Design (pp. 129–134).
Yassin, Y.H., Kjeldsberg, P.G., Hulzink, J., Romero, I., Huisken, J. (2009). Ultra low power application specific instruction-set processor design for a cardiac beat detector algorithm. In Proceedings of the NORCHIP (pp. 1–4).
Zhong, H., Fan, K., Mahlke, S., Schlansker, M. (2005). A distributed control path architecture for VLIW processors. In Proceedings of the international conference on parallel architectures and compilation techniques (pp. 197–206).
Zhong, H., Lieberman, S.A., Mahlke, S.A. (2007). Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the international symposium on high performance computer architecture (pp. 25–36).

Download references

Acknowledgments

This work is supported by the Spanish Ministry of Science and Innovation, under grant BES-2009-023681, and the Spanish Ministry of Economy and Competitiveness, under grant TEC2012-33892.

Author information

Authors and Affiliations

Computer Science Faculty, Complutense University of Madrid, Madrid, Spain
Antonio Artes & Jose L. Ayala
Smart Systems and Energy Technology Department, IMEC, Leuven, Belgium
Robert Fasthuber, Praveen Raghavan & Francky Catthoor

Authors

Antonio Artes
View author publications
You can also search for this author in PubMed Google Scholar
Robert Fasthuber
View author publications
You can also search for this author in PubMed Google Scholar
Jose L. Ayala
View author publications
You can also search for this author in PubMed Google Scholar
Praveen Raghavan
View author publications
You can also search for this author in PubMed Google Scholar
Francky Catthoor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Artes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Artes, A., Fasthuber, R., Ayala, J.L. et al. Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded Systems. J Sign Process Syst 72, 69–85 (2013). https://doi.org/10.1007/s11265-013-0749-z

Download citation

Received: 15 October 2012
Accepted: 03 April 2013
Published: 11 May 2013
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11265-013-0749-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design Space Exploration of Distributed Loop Buffer Architectures with Incompatible Loop-Nest Organisations in Embedded Systems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

A Non-Stop Double Buffering Mechanism for Dataflow Architecture

NoC-based hardware software co-design framework for dataflow thread management

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now