skip to main content
10.1145/2491845.2491859acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Rapid, low-power loop execution in a network of functional units

Published: 19 September 2013 Publication History

Abstract

The need for high-performance computing and low-power operation has led to the emergence of new processor architectures, with most recent designs based on the combination of multiple cores and multiple threads per core. In our work, we are exploring an architecture of multiple instruction pipelines, which merge into a common back-end, formed as a network of functional units. We focus on the back-end in this paper, and in particular, on a rapid, low-power execution of loops, based on data flow. We dispatch the loop body instructions on the network of functional units only once, and we then let the loop execute in a dataflow manner, without any other instruction issue before loop completion. In this way, we do not only speed up the loop execution but we also save energy, since during the execution of the loop the whole front end of the pipeline is not used and can be turned off. We have simulated the functional unit network on microarchitecture level, running a number of Livermore loops. The results we obtained show that the proposed architecture can accelerate loop execution by up to N/k, for a network of N units and loop body size of N instructions, and an issue rate of k instructions per cycle.

References

[1]
Agarwal, V., Hrishikesh, M., Keckler, S. and Burger, D. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. 2000. In Proc. of the 27th Int. Symp. on Comp. Arch. (ISCA'00), 248--259.
[2]
Blake, G., Dreslinski, R., Mudge, T. and Flautner, K. 2010. Evolution of Thread-Level Parallelism in Desktop Applications. In Proc. of the 37th Int. Symp. on Comp. Arch. (ISCA'10), 302--313.
[3]
Burger, D. and Keckler, S. 2005. 19.5 Breaking the GOP/Watt Barrier with EDGE Architectures. GOMACTech Intelligent Technologies Conference.
[4]
Burger, D., Keckler, S., McKinley, K., Dahlin, M., John, L., Lin, C., Moore, C., Burrill, J., McDonald, R., Yoder, W. and the TRIPS Team. 2004. Scaling to the End of Silicon with EDGE Architectures. In Journal Computer Archives Volume 37 Issue 7, 44--55.
[5]
Clark, N., Hormati, A. and Mahlke, S. 2008. VEAL: Virtualized Execution Accelerator for Loops. In Proc. ISCA'08.
[6]
Gebhart, M., Maher, B., Coons, K., Diamond, J., Gratz, P., Marino, M., Ranganathan, N., Robatmili, B., Smith, A., Burrill, J., Keckler, S., Burger, D. and McKinley, K. 2009. An Evaluation of the TRIPS Computer System. In Proc. of the 14th Int. Conf. on Arch. Support for Programming Languages and Operating Systems (ASPLOS XIV), 1--12.
[7]
Gupta, S., Feng, S., Ansari, A. and Mahlke, S. 2010. Erasing Core Boundaries for Robust and Configurable Performance. In Proc. Int. Symp. on Microarch. (MICRO'10).
[8]
Mathew, B. and Davis, A. 2004. A Loop Accelerator for Low Power Embedded VLIW Processors. In Proc. of the 2nd Int. Conf. on CODES+ISSS'04.
[9]
Paschalis, A. 1999. An effective BIST architecture for fast multiplier cores. Design, Automation and Test in Europe Conference and Exhibition.
[10]
Rajagopalan, D., Sethumadhavan, S., Burger, D. and Keckler, S. 2004. Scalable Selective Re-Execution for EDGE Architectures. In Proc. of the 11th Int. Conf. on Arch. Support for Programming Languages and Operating Systems (ASPLOS XI), 120--132.
[11]
Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S. and Moore, C. 2003. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture. In Proc. of the 30th Int. Symp. on Comp. Arch. (ISCA'03), 422--433.
[12]
Shee, S., Parameswaran, S. and Cheung, N. 2005. Novel Architecture for Loop Acceleration: A Case Study. In Proc. of the 3rd Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05), 297--302.
[13]
Smith, A., Burrill, J., Gibson, J., Maher, B., Nethercote, N., Yoder, B., Burger, D. and McKinley K. 2006. Compiling for EDGE Architectures. In Proc. of the Int. Symp. on Code Generation and Optimization (CGO'06), 185--195.
[14]
Veeramachaneni and Sreehari. 2007. Efficient Design of 32-bit Comparators Using Carry Look-ahead Logic. Conference Publications, Montreal, Que.
[15]
Yu, J., Lemieux, G. and Eagleston, C. 2008. Vector Processing as a Soft-core CPU Accelerator. In Proc. of the 16th Int. Symp. on Field Programmable Gate Arrays (FPGA'08), 222--232.

Cited By

View all
  • (2015)Performance and power simulation of a functional-unit-network processor with simplescalar and wattchProceedings of the 19th Panhellenic Conference on Informatics10.1145/2801948.2801958(71-76)Online publication date: 1-Oct-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PCI '13: Proceedings of the 17th Panhellenic Conference on Informatics
September 2013
359 pages
ISBN:9781450319690
DOI:10.1145/2491845
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • University of Macedonia
  • Aristotle University of Thessaloniki
  • The University of Sheffield: The University of Sheffield
  • Alexander TEI of Thessaloniki

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dataflow
  2. loop acceleration
  3. multifunction units
  4. shared resources

Qualifiers

  • Research-article

Conference

PCI 2013
Sponsor:
  • The University of Sheffield
PCI 2013: 17th Panhellenic Conference on Informatics
September 19 - 21, 2013
Thessaloniki, Greece

Acceptance Rates

Overall Acceptance Rate 190 of 390 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Performance and power simulation of a functional-unit-network processor with simplescalar and wattchProceedings of the 19th Panhellenic Conference on Informatics10.1145/2801948.2801958(71-76)Online publication date: 1-Oct-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media