Skip to main content

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded processors. Unlike conventional general purpose processors, ASIPs and embedded processors typically run a single application and hence must be optimized extensively for this in order to extract maximum performance. Further, low power and low cost requirements of ASIPs may demand reuse of pipeline stages causing pipelines with complex structural hazards. In such architectures, exploiting higher ILP is a major challenge to the designer.

Existing techniques deal with either scheduling hardware pipelines to obtain higher throughput or software pipelining—an instruction scheduling technique for iterative computation—for exploiting greater ILP. We integrate these techniques to co-schedule hardware and software pipelines to achieve greater instruction throughput. In this paper, we develop the underlying theory of Co-Scheduling, called the Modulo-Scheduled Pipeline (or MS-Pipeline) theory. More specifically, we establish the necessary and sufficient condition for achieving the maximum throughput in a given pipeline operating under modulo scheduling. Further, we establish a sufficient condition to achieve a specified throughput, based on which we also develop a methodology for designing the hardware pipelines that achieve such a throughput. Further, we present initial experimental results which help to establish the usefulness of MS-pipeline theory in software pipelining. As the proposed theory helps to analyze and improve the throughput of Modulo-Scheduled Pipelines (MS-pipelines), it is especially useful in designing ASIPs and embedded processors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altman, E. R., R. Govindarajan, and G. R. Gao. Scheduling and Mapping: Software Pipelining in the Presence of Structural Hazards. In Proc. of the ACM SIGPLAN '95 Conf. on Programming Language Design and Implementation, La Jolla, CA, June 18–21, 1995, pp. 139-150.

  2. Bala, V. and N. Rubin. Efficient Instruction Scheduling Using Finite State Automata. In Proc. of the 28th Ann. Intl. Symp. on Microarchitecture, Ann Arbor, MI, 1995, pp. 46-56.

  3. Chaar, J. K. and E. S. Davidson. Cyclic Job Shop Scheduling Using Collision Vectors, Technical Report CSE-TR-169-93, University of Michigan, Ann Arbor, MI, Aug. 1993.

    Google Scholar 

  4. Dehnert, J. C., P. Y.-T. Hsu, and J. P. Bratt. Overlapped Loop Support in the Cydra 5. In Proc. of the Third Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, April 3–6, 1989, pp. 26-38.

  5. Dehnert, J. C. and R. A. Towle. Compiling for Cydra 5, Journal of Supercomputing, vol. 7, pp. 181-227, May 1993.

  6. Eichenberger, A. E., E. S. Davidson, and S. G. Abraham. Minimum Register Requirements for a Modulo Schedule. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, Nov. 30–Dec. 1994, pp. 75-84.

  7. Gasperoni, F. and U. Schwiegelshohn. Efficient Algorithms for Cyclic Scheduling. Res. Rep. RC 17068, IBM T. J. Watson Res. Center, Yorktown Heights, NY, 1991.

    Google Scholar 

  8. Govindarajan, R., E. R. Altman, and G. R. Gao. Minimizing Register Requirements under Resource-Constrained Rate-Optimal Software Pipelining. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, Nov. 30–Dec. 2, 1994, pp. 85-94.

  9. Govindarajan, R., E. R. Altman, and G. R. Gao. A Framework for Resource-Constrained Rate-Optimal Software Pipelining, IEEE Trans. on Parallel and Distrib. Systems, vol. 7,no. 11, pp. 1133-1149, Nov. 1996.

    Google Scholar 

  10. Govindarajan, R., E. R. Altman, and G. R. Gao. Co-Scheduling Hardware and Software Pipelines. In Proc. of the Second Intl. Symp. on High-Performance Computer Architecture, San Jose, CA, Feb. 3–7, 1996, pp. 52-61.

  11. Govindarajan, R., N. S. S. Narasimha Rao, E. R. Altman, and G. R. Gao. Enhanced Co-Scheduling: A Software Pipelining Method using Modulo-Scheduled Pipeline Theory, Intl. Journal of Parallel Programming, vol. 28,no. 1, pp. 1-46, Feb. 2000.

    Google Scholar 

  12. Gupta, R. K. and G. De Micheli. Hardware-Software Cosynthesis for Digital Systems, IEEE Design & Test of Computers, pp. 29-41, Sept. 1993.

  13. Huff, R. A. Lifetime-Sensitive Modulo Scheduling. In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation, Albuquerque, NM, June 23–25, 1993, pp. 258-267.

  14. Kogge, P. M. The Architecture of Pipelined Computers. McGraw-Hill Book Co., New York, NY, 1981.

    Google Scholar 

  15. Lam, M. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proc. of the SIGPLAN '88 Conf. on Programming Language Design and Implementation, Atlanta, GA, June 22–24, 1988, pp. 318-328.

  16. Lee, C., M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. of the 30th Ann. Intl. Symp. on Microarchitecture, Research Triangle Park, NC, Dec. 1–3, 1997, pp. 330-335.

  17. Llosa, J., M. Valero, E. Ayguadé, and A. González. Hypernode Reduction Modulo Scheduling. In Proc. of the 28th Ann. Intl. Symp. on Microarchitecture, Ann Arbor, MI, Nov. 29–Dec. 1995, pp. 350-360.

  18. Muller, T. Employing Finite State Automata for Resource Scheduling. In Proc. of the 26th Ann. Intl. Symp. on Microarchitecture, Austin, TX, Dec. 1–3, 1993.

  19. Patel, J. H. and E. S. Davidson. Improving the Throughput of a Pipeline by Insertion of Delays. In Proc. of the 3rd Ann. Symp. on Computer Architecture, Clearwater, FL, Jan. 19–21, 1976, pp. 159-164.

  20. Philips Semiconductors. TriMedia. http://www.semiconductors.com/trimedia/

  21. Proebsting, T. A. and C. W. Fraser. Detecting Pipeline Structural Hazards Quickly. In Conf. Rec. of the 21st ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, Portland, OR, Jan. 17–21, 1994, pp. 280-286.

  22. Rau, B. R. and C. D. Glaeser. Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing. In Proc. of the 14th Ann. Microprogramming Work., Chatham, MA, Oct. 12–15, 1981, pp. 183-198.

  23. Rau, B. R. and J. A. Fisher. Instruction-Level Parallel Processing: History, Overview and Perspective, Journal of Supercomputing, vol. 7, pp. 9-50, May 1993.

  24. Rau, B. R. Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. In Proc. of the 27th Ann. Intl. Symp. on Microarchitecture, San Jose, CA, 1994, pp. 63-74.

  25. Texas Instruments. TMS 320C6000, http://www.ti.com/sc/docs/products/c6000.

  26. Reiter, R. Scheduling Parallel Computations, Journal of the ACM, vol. 15,no. 4, pp. 590-599, Oct. 1968.

    Google Scholar 

  27. Wang, J., C. Eisenbeis, M. Jourdan, and B. Su. Decomposed Software Pipelining: A New Approach to Exploit Instruction-Level Parallelism for Loop Programs, Res. Rep. No. 1838, Institut Nat. de Recherche on Informatique et en Automatique (INRIA), Rocquencourt, France, Jan. 1993.

    Google Scholar 

  28. Waingold, E., M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Barring It to All Software: Raw Machines, IEEE Computer, vol. 30,no. 9, pp. 86-93, Sept. 1997.

    Google Scholar 

  29. Weinhardt, M. Compilation and Pipeline Synthesis for Reconfigurable Architectures Loops. In Reconfigurable Architectures—High Performance by Configware (Proc. of the RAW'97), April 1997.

  30. Zhang, C., R. Govindarajan, S. Ryan, and G. R. Gao. Efficient State-Diagram Construction Methods for Software Pipelining. In Proc. of the Compiler Construction Conference, Amsterdam, The Netherlands, March 1999.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Govindarajan, R., Altman, E.R. & Gao, G.R. A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors. Design Automation for Embedded Systems 6, 243–275 (2002). https://doi.org/10.1023/A:1014050303852

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014050303852