Skip to main content
Log in

On the design and performance of conventional pipelined architectures

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Pipelining is a widely used technique for implementing architectures that have inherent temporal parallelism when there is an operational requirement for high throughput. Many variations on the basic theme have been proposed, with varying degrees of success. The aim of this paper is to present a critical review of conventional pipelined architectures and put some well-known problems in sharp relief. It is argued that conventional pipelined architectures have underlying limitations that can only be dealt with by adopting a different view of pipelining. These limitations are explained in terms of discontinuities in the flow of instructions and data, and representative machines are examined in support of this argument. In a companion paper [Topham, Omondi and Ibbett, 1988] we examine an alternative approach to the design of pipelined architectures and introduce an alternative theory of pipelining, which we call Context Flow.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • AMD. 1987. AMD 2900 user's manual. Publication no. 08996A, Advanced Micro Devices, Sunnyvale, Calif.

    Google Scholar 

  • CDC. 1977. Control Data 7600/Cyber 70 Model 76 computer systems: Hardware reference manual. Control Data Corporation, St. Paul, Minn.

    Google Scholar 

  • CDC. 1981. Control Data Cyber 200 Model 205: Hardware reference manual. Control Data Corporation, St. Paul, Minn.

    Google Scholar 

  • Chen, T. C. 1971. Parallelism, pipelining, and computer efficiency. Computer Design, 10, 1 (Jan.), 69–74.

    Google Scholar 

  • CRAY. 1976. CRAY-1 computer system: Hardware reference manual. Cray Research, Mendota Heights, Minn.

    Google Scholar 

  • Ditzel, D. R., and McLellan, H. R. 1987. Branch folding in the CRISP microprocessor: Reducing branch delay to zero. In Conference Proceedings-14th Annual International Symposium on Computer Architecture (Pittsburgh, June 2–5), pp. 2–9.

  • Edwards, D. B. G., Knowles, A. E., and Woods, J. V. 1980. MU6-G: A new design to achieve mainframe performance from a mini-sized computer. In Conference Proceedings-7th Annual International Symposium on Computer Architecture (La Baule, May 6–8), IEEE Computer Society Press, pp. 161–167.

  • Fisher, J. A. 1987. VLIW architectures: Supercomputing via overlapped execution. In Conference Proceedings—2nd International Conference on Supercomputing (Santa Barbara, May 3–8), International Super-computing Institute, pp. 353–361.

  • Gibbons, P. B., and Muchnick, S. S. 1986. Efficient instruction scheduling for a pipelined architecture. In Conference Proceedings—SIGPLAN Symposium on Compiler Construction (Palo Alto, June 25–27), Association for Computing Machinery, pp. 11–16.

  • Goodman, J. R., Hsieh, J., Liou, K., Pleszkun, A. R. Schechter, P. B. and Young, H. C. 1985. PIPE: A VLSI decoupled architecture. In Conference Proceedings—12th Annual International Symposium on Computer Architecture (Boston, June 17–19), IEEE Computer Society Press, pp. 20–27.

  • Hennessy, J. L. 1984. VLSI processor architecture. IEEE Transactions on Computers, C-33, 4 (Apr.), 1221–1246.

    Google Scholar 

  • Hill, M. D. 1987. Aspects of cache memory and instruction buffer performance. Report No. UCB/CSD 87/381, Computer Science Division, University of California-Berkeley.

  • Hintz, R. G., and Tale, D. P. 1972. Control Data STAR-100 processor design. In Conference Proceedings —FALL COMPCON (San Francisco, Sept. 12–14), IEEE Computer Society Press, pp. 1–4.

  • Hockney, R. W., and Jesshope, C. R. 1981. Parallel Computers. Adam Hilger, Bristol, UK.

    Google Scholar 

  • Holgate, R. W., and Ibbett, R. N. 1980. An analysis of instruction-fetching strategies in pipelined computers. IEEE Transactions on Computers, C-29, 4 (Apr.), 325–329.

    Google Scholar 

  • Hwang, K., and Briggs, F. A. 1984. Computer Architecture and Parallel Processing. McGraw-Hill, New York.

    Google Scholar 

  • Ibbett, R. N. 1981. Vector processing. In Conference Proceedings-International Computing Symposium, Westbury House, pp. 337–341.

  • Ibbett, R. N. 1982. The Architecture of High Performance Computers. Springer-Verlag, New York.

    Google Scholar 

  • Ibbett, R. N., Capon, P. C., and Topham, N. P. 1985. MU6-V: A parallel vector processing system. In Conference Proceedings-12th Annual International Symposium on Computer Architecture (Boston, June 17–19), IEEE Computer Society Press, pp. 136–144.

  • Kunkel, S. R., and Smith, A. J. 1986. Optimal pipelining in supercomputers. In Conference Proceedings -13th Annual International Symposium on Computer Architecture (Tokyo, June 2–5), IEEE Computer Society Press, pp. 404–411.

  • Lee, J. F. K., and Smith, A. J. 1984. Branch prediction strategies and branch target buffer design. IEEE Computer 17, 1 (Jan.), 6–22.

    Google Scholar 

  • Lin, Q. 1986. Design of a vector processor. Journal of Computer Science and Technology, 1, 1: 26–34.

    Google Scholar 

  • Lincoln, N. R., 1982. Technology and design tradeoffs in the creations of a modern supercomputer. IEEE Transactions on Computers, C-31, 5 (May), 349–362.

    Google Scholar 

  • McFarling, S., and Hennessy, J. 1986. Reducing the cost of branches. In Conference Proceedings—13th Annual International Symposium on Computer Architecture (Tokyo, June 2–5), IEEE Computer Society Press, pp. 396–403.

  • Miura, K. 1986. Fujitsu's supercomputer: Facom Vector Processor system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 137–152.

    Google Scholar 

  • Morris, D., and Ibbett, R. N. 1979. The MU5 Computer System. Springer-Verlag, New York.

    Google Scholar 

  • Myers, G. J. 1982. Advances in Computer Architecture. Wiley, New York.

    Google Scholar 

  • Murphy, J. O., and Wade, R. M. 1970. The IBM 360/195. Datamation, (Apr.), 72–79.

  • Odaka, T., Nagashima, S., and Kawabe, S. 1986. Hitachi supercomputer S-810 array processor system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 113–136.

    Google Scholar 

  • Omondi, A. R., and Brock, J. D. 1987. Micromultiprogramming a vector pipeline. Internal Report, Department of Computer Science, University of North Carolina, Chapel Hill (in preparation).

    Google Scholar 

  • Patterson, D. A., Garrison, P., Hill, M., Lioupis, D., Nyberg, C., Sippel, T., and Van Dyke, K. 1983. Architecture of a VLSI instruction cache for a RISC. In Conference Proceedings-10th International Symposium on Computer Architecture (Stockholm, June 13–17), IEEE Computer Society Press, pp. 108–118.

  • Radin, G. 1983. The IBM 801 minicomputer. IBM Journal of Research and Development, 27, 3: 237–246.

    Google Scholar 

  • Rammamoorthy, C. V., and Li, H. F. 1977. Pipeline architecture. ACM Computing Surveys, 9, 1 (Mar.), 61–102.

    Google Scholar 

  • Sequin, C. A. 1983. Design and implementation of RISC I. In VLSI Architecture, B. Randell and P. C. Trealeaven, eds., Prentice-Hall, Englewood Cliffs, N.J., pp. 276–298.

    Google Scholar 

  • Thompson, J. R. 1986. The CRAY-1, the CRAY X-MP, the CRAY-2 and beyond: The supercomputers of Cray Research. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 69–82.

    Google Scholar 

  • Thornton, J. E., 1970. Design of a Computer: The Control Data 6600. Scott, Foresman, Glenview, Ill.

    Google Scholar 

  • Tomasulo, R. M., 1967. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11, 1: 25–33.

    Google Scholar 

  • Topham, N. P. 1987. Performance analysis of a data-driven multiple vector processing system. In Highly Parallel Computers, G. L. Reijns and M. H. Barton, eds., North-Holland, Amsterdam, The Netherlands, pp. 111–125.

    Google Scholar 

  • Topham, N. P., Omondi, A. R., and Ibbett, R. N. 1988. Context Flow: An alternative to conventional pipelined architectures. The Journal of Supercomputing, to appear in Vol. 2, No. 1.

  • Tucker, S. G. 1986. The IBM 3090: An overview. IBM Systems Journal, 25, 1: 4–19.

    Google Scholar 

  • Watanabe, T., Katayama, H., and Iwaya, A. 1986. Introduction of the NEC supercomputer SX system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 153–168.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Topham, N.P., Omondi, A. & Ibbett, R.N. On the design and performance of conventional pipelined architectures. J Supercomput 1, 353–393 (1988). https://doi.org/10.1007/BF00128488

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00128488

Keywords

Navigation