Abstract
Pipelining is a widely used technique for implementing architectures that have inherent temporal parallelism when there is an operational requirement for high throughput. Many variations on the basic theme have been proposed, with varying degrees of success. The aim of this paper is to present a critical review of conventional pipelined architectures and put some well-known problems in sharp relief. It is argued that conventional pipelined architectures have underlying limitations that can only be dealt with by adopting a different view of pipelining. These limitations are explained in terms of discontinuities in the flow of instructions and data, and representative machines are examined in support of this argument. In a companion paper [Topham, Omondi and Ibbett, 1988] we examine an alternative approach to the design of pipelined architectures and introduce an alternative theory of pipelining, which we call Context Flow.
Similar content being viewed by others
References
AMD. 1987. AMD 2900 user's manual. Publication no. 08996A, Advanced Micro Devices, Sunnyvale, Calif.
CDC. 1977. Control Data 7600/Cyber 70 Model 76 computer systems: Hardware reference manual. Control Data Corporation, St. Paul, Minn.
CDC. 1981. Control Data Cyber 200 Model 205: Hardware reference manual. Control Data Corporation, St. Paul, Minn.
Chen, T. C. 1971. Parallelism, pipelining, and computer efficiency. Computer Design, 10, 1 (Jan.), 69–74.
CRAY. 1976. CRAY-1 computer system: Hardware reference manual. Cray Research, Mendota Heights, Minn.
Ditzel, D. R., and McLellan, H. R. 1987. Branch folding in the CRISP microprocessor: Reducing branch delay to zero. In Conference Proceedings-14th Annual International Symposium on Computer Architecture (Pittsburgh, June 2–5), pp. 2–9.
Edwards, D. B. G., Knowles, A. E., and Woods, J. V. 1980. MU6-G: A new design to achieve mainframe performance from a mini-sized computer. In Conference Proceedings-7th Annual International Symposium on Computer Architecture (La Baule, May 6–8), IEEE Computer Society Press, pp. 161–167.
Fisher, J. A. 1987. VLIW architectures: Supercomputing via overlapped execution. In Conference Proceedings—2nd International Conference on Supercomputing (Santa Barbara, May 3–8), International Super-computing Institute, pp. 353–361.
Gibbons, P. B., and Muchnick, S. S. 1986. Efficient instruction scheduling for a pipelined architecture. In Conference Proceedings—SIGPLAN Symposium on Compiler Construction (Palo Alto, June 25–27), Association for Computing Machinery, pp. 11–16.
Goodman, J. R., Hsieh, J., Liou, K., Pleszkun, A. R. Schechter, P. B. and Young, H. C. 1985. PIPE: A VLSI decoupled architecture. In Conference Proceedings—12th Annual International Symposium on Computer Architecture (Boston, June 17–19), IEEE Computer Society Press, pp. 20–27.
Hennessy, J. L. 1984. VLSI processor architecture. IEEE Transactions on Computers, C-33, 4 (Apr.), 1221–1246.
Hill, M. D. 1987. Aspects of cache memory and instruction buffer performance. Report No. UCB/CSD 87/381, Computer Science Division, University of California-Berkeley.
Hintz, R. G., and Tale, D. P. 1972. Control Data STAR-100 processor design. In Conference Proceedings —FALL COMPCON (San Francisco, Sept. 12–14), IEEE Computer Society Press, pp. 1–4.
Hockney, R. W., and Jesshope, C. R. 1981. Parallel Computers. Adam Hilger, Bristol, UK.
Holgate, R. W., and Ibbett, R. N. 1980. An analysis of instruction-fetching strategies in pipelined computers. IEEE Transactions on Computers, C-29, 4 (Apr.), 325–329.
Hwang, K., and Briggs, F. A. 1984. Computer Architecture and Parallel Processing. McGraw-Hill, New York.
Ibbett, R. N. 1981. Vector processing. In Conference Proceedings-International Computing Symposium, Westbury House, pp. 337–341.
Ibbett, R. N. 1982. The Architecture of High Performance Computers. Springer-Verlag, New York.
Ibbett, R. N., Capon, P. C., and Topham, N. P. 1985. MU6-V: A parallel vector processing system. In Conference Proceedings-12th Annual International Symposium on Computer Architecture (Boston, June 17–19), IEEE Computer Society Press, pp. 136–144.
Kunkel, S. R., and Smith, A. J. 1986. Optimal pipelining in supercomputers. In Conference Proceedings -13th Annual International Symposium on Computer Architecture (Tokyo, June 2–5), IEEE Computer Society Press, pp. 404–411.
Lee, J. F. K., and Smith, A. J. 1984. Branch prediction strategies and branch target buffer design. IEEE Computer 17, 1 (Jan.), 6–22.
Lin, Q. 1986. Design of a vector processor. Journal of Computer Science and Technology, 1, 1: 26–34.
Lincoln, N. R., 1982. Technology and design tradeoffs in the creations of a modern supercomputer. IEEE Transactions on Computers, C-31, 5 (May), 349–362.
McFarling, S., and Hennessy, J. 1986. Reducing the cost of branches. In Conference Proceedings—13th Annual International Symposium on Computer Architecture (Tokyo, June 2–5), IEEE Computer Society Press, pp. 396–403.
Miura, K. 1986. Fujitsu's supercomputer: Facom Vector Processor system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 137–152.
Morris, D., and Ibbett, R. N. 1979. The MU5 Computer System. Springer-Verlag, New York.
Myers, G. J. 1982. Advances in Computer Architecture. Wiley, New York.
Murphy, J. O., and Wade, R. M. 1970. The IBM 360/195. Datamation, (Apr.), 72–79.
Odaka, T., Nagashima, S., and Kawabe, S. 1986. Hitachi supercomputer S-810 array processor system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 113–136.
Omondi, A. R., and Brock, J. D. 1987. Micromultiprogramming a vector pipeline. Internal Report, Department of Computer Science, University of North Carolina, Chapel Hill (in preparation).
Patterson, D. A., Garrison, P., Hill, M., Lioupis, D., Nyberg, C., Sippel, T., and Van Dyke, K. 1983. Architecture of a VLSI instruction cache for a RISC. In Conference Proceedings-10th International Symposium on Computer Architecture (Stockholm, June 13–17), IEEE Computer Society Press, pp. 108–118.
Radin, G. 1983. The IBM 801 minicomputer. IBM Journal of Research and Development, 27, 3: 237–246.
Rammamoorthy, C. V., and Li, H. F. 1977. Pipeline architecture. ACM Computing Surveys, 9, 1 (Mar.), 61–102.
Sequin, C. A. 1983. Design and implementation of RISC I. In VLSI Architecture, B. Randell and P. C. Trealeaven, eds., Prentice-Hall, Englewood Cliffs, N.J., pp. 276–298.
Thompson, J. R. 1986. The CRAY-1, the CRAY X-MP, the CRAY-2 and beyond: The supercomputers of Cray Research. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 69–82.
Thornton, J. E., 1970. Design of a Computer: The Control Data 6600. Scott, Foresman, Glenview, Ill.
Tomasulo, R. M., 1967. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11, 1: 25–33.
Topham, N. P. 1987. Performance analysis of a data-driven multiple vector processing system. In Highly Parallel Computers, G. L. Reijns and M. H. Barton, eds., North-Holland, Amsterdam, The Netherlands, pp. 111–125.
Topham, N. P., Omondi, A. R., and Ibbett, R. N. 1988. Context Flow: An alternative to conventional pipelined architectures. The Journal of Supercomputing, to appear in Vol. 2, No. 1.
Tucker, S. G. 1986. The IBM 3090: An overview. IBM Systems Journal, 25, 1: 4–19.
Watanabe, T., Katayama, H., and Iwaya, A. 1986. Introduction of the NEC supercomputer SX system. In Supercomputers: Class VI Systems, Hardware and Software, S. Fernbach, ed., North-Holland, Amsterdam, The Netherlands, pp. 153–168.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Topham, N.P., Omondi, A. & Ibbett, R.N. On the design and performance of conventional pipelined architectures. J Supercomput 1, 353–393 (1988). https://doi.org/10.1007/BF00128488
Issue Date:
DOI: https://doi.org/10.1007/BF00128488