skip to main content
10.1145/503048.503075acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
Article

Performance-constrained pipelining of software loops onto reconfigurable hardware

Published:24 February 2002Publication History

ABSTRACT

Retiming and slowdown are algorithms that can be used to pipeline synchronous circuits. Iterative modulo scheduling is an algorithm for software pipelining in the presence of resource constraints. Integrating the best features of both yields a pipelining algorithm, retimed modulo scheduling, that can more effectively exploit the idiosyncrasies of reconfigurable hardware. It also fits naturally into a design space exploration process to trade-off speed for power, energy or area.

References

  1. 1.C. Leiserson, J. Saxe, "Retiming Synchronous Systems," Algorithmica, 6(1), 1991.Google ScholarGoogle Scholar
  2. 2.H. Touati, R. Brayton, "Computing the Initial States of Retimed Circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 1, January 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.K. Eckl, J. Madre, P. Zepter, C. Legl, "A Practical Approach to Multiple-Class Retiming," Proceedings of the 36th ACM/ IEEE Conference on Design Automation, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.V.Singhal,S.Malik,R.Brayton,"The Case forRetiming with Explicit Reset Circuitry," International Conference on Computer-Aided Design, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.B. Rau, "Iterative Modulo Scheduling," HP Labs Technical Report HPL-94-115.Google ScholarGoogle Scholar
  6. 6.M. Papaefthymiou, "Understanding Retiming through Maximum Average-Weight Cycles," Proceedings of the Third Annual ACM Symposium on Parallel Algorithms and Architectures, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.S. Kundu, L. Huisman, I. Nair, V. Iyengar, "A Small Test Generator for Large Designs," International Test Conference, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.C. Leiserson, J. Saxe, "Optimizing Synchronous Systems," Journal of VLSI and Computer Systems,vol.1,no1,1983.Google ScholarGoogle Scholar
  9. 9.C. Leiserson, "Systolic and Semisystolic Design," IEEE International Conference on Computer Design / VLSI in Computers, 1983.Google ScholarGoogle Scholar
  10. 10.N. Shenoy, R. Rudell, "Efficient Implementation of Retiming," 1994 IEEE/ACM International Conference on Computer- aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.P. Pan, G. Chen, "Optimal Retiming for Initial State Computation," 12th International Conference on VLSI Design,January 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.M. Wolfe, M. Lam, "A Loop Transformation Theory and Algorithm to Maximize Parallelism," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 4, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.M. J. Wolfe, "More Iteration Space Tiling," Proceedings of Supercomputing '89, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.S. Hassoun, C. Ebeling, "Architectural Retiming: Pipelining Latency-Constrained Circuits," 33rd Design Automation Conference, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.D. Maydan, J. Hennessy, M. Lam, "Efficient and Exact Data Dependence Analysis," Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.S. Mahlke, "Exploiting Instruction-level Parallelism in the Presence of Conditional Branches," Ph.D. dissertation, University of Illinois, Sept. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.J. Tiernan, "An Efficient Search Algorithm to Find the Elementary Circuits of a Graph," Communications of the ACM, vol. 13, no. 12, December 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.T. Callahan, J. Wawrzynek, "Adapting Software Pipelining for Reconfigurable Computing," Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.M. Gokhale, J. Stone, E. Gomersall, "Co-synthesis to a Hybrid RISC/FPGA Architecture," Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, vol. 24, no. 2, March 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.R.Schreiber,S.Aditya,B.Rau,V.Kathail,S.Mahlke,S. Abraham, G. Snider, "High-Level Synthesis of Nonprogrammable ardware Accelerators," HP Labs Technical Report HPL-2000-31.Google ScholarGoogle Scholar
  21. 21.V. Srinivasan, R. Vemuri, "A Retiming Based Relaxation Heuristic for Resource-Constrained Loop Pipelining," Proceedings of the Eleventh International Conference on VLSI Design: VLSI for Signal Processing, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22.P. Calland, A. Darte, Y. Robert, "Circuit Retiming Applied to Decomposed Software Pipelining," IEEE Transactions on parallel and Distributed Systems, vol. 9, no. 1, January 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.M. Weinhardt, W. Luk, "Pipeline Vectorization," IEEE Transactions on Computer-Aided Designs of Integrated Circuits and Systems, vol. 20, no.2, February 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.T. O'Neil, S. Tongsima, E. Sha, "Optimal Scheduling of Data- Flow Graphs Using Extended Retiming," Proceedings of the ISCA 12th International Conference on Parallel and Distributed Computing Systems, 1999.Google ScholarGoogle Scholar
  25. 25.J.Monteiro,S.Devadas,P.Ashar,A.Mauskar,"Scheduling Techniques to Enable Power Management," 33rd Design Automation Conference, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26.H. Yun, J. Kim, "Power-Aware Modulo Scheduling for High- Performance VLIWProcessors," International Symposium on Low Power Electronics and Design, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27.E. Musoll, J. Cortadella, "Scheduling and Resource Binding for Low Power," Proceedings of the International Symposium on System Synthesis, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Performance-constrained pipelining of software loops onto reconfigurable hardware

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FPGA '02: Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
      February 2002
      257 pages
      ISBN:1581134525
      DOI:10.1145/503048

      Copyright © 2002 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 February 2002

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate125of627submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader