Article

Performance-constrained pipelining of software loops onto reconfigurable hardware

Author:
Greg Snider

Hewlett-Packard Laboratories, Palo Alto, CA

Hewlett-Packard Laboratories, Palo Alto, CA
View Profile

FPGA '02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arraysFebruary 2002Pages 177–186https://doi.org/10.1145/503048.503075

Published:24 February 2002Publication History

FPGA '02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays

Pages 177–186

ABSTRACT

Retiming and slowdown are algorithms that can be used to pipeline synchronous circuits. Iterative modulo scheduling is an algorithm for software pipelining in the presence of resource constraints. Integrating the best features of both yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware. It also fits naturally into a design space exploration process to trade-off speed for power, energy or area.

References

1.C. Leiserson, J. Saxe, "Retiming Synchronous Systems," Algorithmica, 6(1), 1991.Google Scholar
2.H. Touati, R. Brayton, "Computing the Initial States of Retimed Circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 1, January 1993.Google ScholarDigital Library
3.K. Eckl, J. Madre, P. Zepter, C. Legl, "A Practical Approach to Multiple-Class Retiming," Proceedings of the 36th ACM/ IEEE Conference on Design Automation, 1999. Google ScholarDigital Library
4.V.Singhal,S.Malik,R.Brayton,"The Case forRetiming with Explicit Reset Circuitry," International Conference on Computer-Aided Design, 1996. Google ScholarDigital Library
5.B. Rau, "Iterative Modulo Scheduling," HP Labs Technical Report HPL-94-115.Google Scholar
6.M. Papaefthymiou, "Understanding Retiming through Maximum Average-Weight Cycles," Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, 1991. Google ScholarDigital Library
7.S. Kundu, L. Huisman, I. Nair, V. Iyengar, "A Small Test Generator for Large Designs," International Test Conference, 1992. Google ScholarDigital Library
8.C. Leiserson, J. Saxe, "Optimizing Synchronous Systems," Journal of VLSI and Computer Systems,vol.1,no1,1983.Google Scholar
9.C. Leiserson, "Systolic and Semisystolic Design," IEEE International Conference on Computer Design / VLSI in Computers, 1983.Google Scholar
10.N. Shenoy, R. Rudell, "Efficient Implementation of Retiming," 1994 IEEE/ACM International Conference on Computer- aided Design. Google ScholarDigital Library
11.P. Pan, G. Chen, "Optimal Retiming for Initial State Computation," 12th International Conference on VLSI Design,January 1999. Google ScholarDigital Library
12.M. Wolfe, M. Lam, "A Loop Transformation Theory and Algorithm to Maximize Parallelism," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, October 1991. Google ScholarDigital Library
13.M. J. Wolfe, "More Iteration Space Tiling," Proceedings of Supercomputing '89, 1989. Google ScholarDigital Library
14.S. Hassoun, C. Ebeling, "Architectural Retiming: Pipelining Latency-Constrained Circuits," 33rd Design Automation Conference, 1996. Google ScholarDigital Library
15.D. Maydan, J. Hennessy, M. Lam, "Efficient and Exact Data Dependence Analysis," Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, 1991. Google ScholarDigital Library
16.S. Mahlke, "Exploiting Instruction-level Parallelism in the Presence of Conditional Branches," Ph.D. dissertation, University of Illinois, Sept. 1996. Google ScholarDigital Library
17.J. Tiernan, "An Efficient Search Algorithm to Find the Elementary Circuits of a Graph," Communications of the ACM, vol. 13, no. 12, December 1970. Google ScholarDigital Library
18.T. Callahan, J. Wawrzynek, "Adapting Software Pipelining for Reconfigurable Computing," Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, 2000. Google ScholarDigital Library
19.M. Gokhale, J. Stone, E. Gomersall, "Co-synthesis to a Hybrid RISC/FPGA Architecture," Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, vol. 24, no. 2, March 2000. Google ScholarDigital Library
20.R.Schreiber,S.Aditya,B.Rau,V.Kathail,S.Mahlke,S. Abraham, G. Snider, "High-Level Synthesis of Nonprogrammable ardware Accelerators," HP Labs Technical Report HPL-2000-31.Google Scholar
21.V. Srinivasan, R. Vemuri, "A Retiming Based Relaxation Heuristic for Resource-Constrained Loop Pipelining," Proceedings of the Eleventh International Conference on VLSI Design: VLSI for Signal Processing, 1998. Google ScholarDigital Library
22.P. Calland, A. Darte, Y. Robert, "Circuit Retiming Applied to Decomposed Software Pipelining," IEEE Transactions on parallel and Distributed Systems, vol. 9, no. 1, January 1998. Google ScholarDigital Library
23.M. Weinhardt, W. Luk, "Pipeline Vectorization," IEEE Transactions on Computer-Aided Designs of Integrated Circuits and Systems, vol. 20, no.2, February 2001. Google ScholarDigital Library
24.T. O'Neil, S. Tongsima, E. Sha, "Optimal Scheduling of Data- Flow Graphs Using Extended Retiming," Proceedings of the ISCA 12th International Conference on Parallel and Distributed Computing Systems, 1999.Google Scholar
25.J.Monteiro,S.Devadas,P.Ashar,A.Mauskar,"Scheduling Techniques to Enable Power Management," 33rd Design Automation Conference, 1996. Google ScholarDigital Library
26.H. Yun, J. Kim, "Power-Aware Modulo Scheduling for High- Performance VLIWProcessors," International Symposium on Low Power Electronics and Design, 2001. Google ScholarDigital Library
27.E. Musoll, J. Cortadella, "Scheduling and Resource Binding for Low Power," Proceedings of the International Symposium on System Synthesis, 1995. Google ScholarDigital Library

Performance-constrained pipelining of software loops onto reconfigurable hardware
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning

Recommendations

Software Pipelining of Nested Loops
CC '01: Proceedings of the 10th International Conference on Compiler Construction

Software pipelining is a technique to improve the performance of a loop by overlapping the execution of several iterations. The execution of a software-pipelined loop goes through three phases: prolog, kernel, and epilog. Software pipelining works best ...
Read More
Single-dimension software pipelining for multidimensional loops

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...
Read More
Single-Dimension Software Pipelining for Multi-Dimensional Loops
CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to outer loops. In this paper, we propose a three-stepapproach, called Single-dimension Software Pipelining(SSP), to software ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
February 2002
257 pages
ISBN:1581134525
DOI:10.1145/503048
General Chair:
Martine Schlag,
Program Chair:
Steve Trimberger
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 610
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance-constrained pipelining of software loops onto reconfigurable hardware

FPGA '02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays

ABSTRACT

References

Cited By

Recommendations

Software Pipelining of Nested Loops

Single-dimension software pipelining for multidimensional loops

Single-Dimension Software Pipelining for Multi-Dimensional Loops