Skip to main content
Log in

A Simulation Study of Decoupled Vector Architectures

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Decoupling techniques can be applied to a vector processor, resulting in a large increase in performance of vectorizable programs. We simulate a selection of the Perfect Club and Specfp92 benchmark suites and compare their execution time on a conventional single port vector architecture with that of a decoupled vector architecture. Decoupling increases the performance by a factor greater than 1.4 for realistic memory latencies, and for an ideal memory system with zero latency, there is still a speedup of as much as 1.3. A significant portion of this paper is devoted to studying the tradeoffs involved in choosing a suitable size for the queues of the decoupled architecture. The hardware cost of the queues need not be large to achieve most of the performance advantages of decoupling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. Agarwal. Performance tradeoffs in multithreaded processors. IEEE Transactions on Parallel and Distributed Systems, 3(5):525–539, 1992.

    Google Scholar 

  2. Bradley J. Benschneider, Andrew J. Black, William J. Bowhill, Sharon M. Britton, Daniel E. Dever, Dale R. Donchin, Robert J. Dupack, Richard M. Fromm, Mary K. Gowan, Paul E. Gronowski, Michael Kantrowitz, Marc E. Lamere, Shekhar Metha, Jeanne E. Meyer, Robert O. Mueller, Andy Olesin, Ronald P. Preston, Donald A. Priore, Sribalan Santhanam, Michael J. Smith, and Gilber M. Wolrich. A 300-MHz 64-b quad-issue CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 30(11):1203–1214, 1995.

    Google Scholar 

  3. W. C. Brantley and Joseph Weiss. Organization and architecture tradeoffs in FOM. In IEEE International Workshop on Computer Systems Organization, March 1983.

  4. Tien-Fu Chen and Jean-Loup Baer. A performance study of software and hardware data prefetching strategies. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pp. 223–232, 1994.

  5. E. U. Cohler and J. E. Storer. Functionally parallel architectures for array processors. Computer, 14:28–36, 1981.

    Google Scholar 

  6. Convex Press, Richardson, Texas, USA. CONVEX Architecture Reference Manual (C Series), 6th edition, 1992.

  7. R. Espasa, M. Valero, D. Padua, M. Jiménez, and E. Ayguadé. Quantitative analysis of vector code. In Euromicro Workshop on Parallel and Distributed Processing. IEEE Computer Society Press, 1995.

  8. Roger Espasa and Mateo Valero. Decoupled vector architectures. In Proceedings of the 2nd International Symposium on High Performance Computer Architecture, pp. 281–290. IEEE Computer Society Press, 1996.

  9. Roger Espasa and Mateo Valero. Multithreaded vector architectures. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture, pp. 237–249. IEEE Computer Society Press, 1997.

  10. Roger Espasa, Mateo Valero, and James E. Smith. Out-of-order vector architectures. In MICRO-30, pp. 160–170. IEEE Press, 1997.

  11. J. R. Goodman, J. T. Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, and H. C. Young. PIPE: A VLSI Decoupled Architecture. In Proceedings of the 12th Annual International Symposium on Computer Architecture, pp. 20–27, June 1985.

  12. P. Y. T. Hsu. Designing the TFP microprocessor. IEEE Micro, 14(2):23–33, 1994.

    Google Scholar 

  13. Ken Kennedy and Kathryn S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the International Conference on Supercomputing, pp. 323–334, 1992.

  14. L. I. Kontothanassis, R. A. Sugumar, G. J. Faanes, J. E. Smith, and M. L. Scott. Cache performance in vector supercomputers. In Proceedings of Supercomputing '94, Washington DC, November 1994. IEEE Computer Society Press.

  15. Lizy Kurian, Paul T. Hulina, and Lee D. Coraor. Memory latency effects in decoupled architectures. IEEE Transactions on Computers, 43(10):1129–1139, 1994.

    Google Scholar 

  16. M. S. Lam. Software pipelining: An effective scheduling technique for VLIW machines. SIGPLAN Notices, 23(7):318–328, 1988.

    Google Scholar 

  17. J. K. F. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 17(1):6–22, 1984.

    Google Scholar 

  18. W. Mangione-Smith, S. G. Abraham, and E. S. Davidson. Vector register design for polycyclic vector scheduling. In 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 154–163, Santa Clara, CA, 1991.

  19. Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and evaluation of a compiler algorithm for prefetching. In 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.

  20. Willi Schönauer and Hartmut Häfner. Explaining the gap between theoretical peak performance and real performance for supercomputer architectures. Scientific Programming, 3:157–168, 1994.

    Google Scholar 

  21. James E. Smith. Decoupled Access/Execute Computer Architectures. ACM Transactions on Computer Systems, 2:289–308, 1984.

    Google Scholar 

  22. James E. Smith, G. E. Dermer, B. D. Vanderwarn, S. D. Klinger, C. M. Rozewski, D. L. Fowler, K. R. Scidmore, and J. P. Laudon. The ZS-1 central processor. In 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 199–204. CS Press, 1987.

  23. James E. Smith, Shlomo Weiss, and Nicholas Y. Pang. A simulation study of decoupled architecture computers. IEEE Transactions on Computers, C-35(8):692–702, 1986.

    Google Scholar 

  24. Juho Tang, Edward S. Davidson, and Johau Tong. Polycyclic vector scheduling vs. chaining on 1-port vector supercomputers. In Proceedings of Supercomputing '88, pp. 122–129, Orlando, Fla. November 1988. IEEE Computer Society Press.

  25. Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 392–403, 1995.

  26. Keneth C. Yager. The Mips R10000 superscalar microprocessor. IEEE Micro, pp. 28–40, 1996.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Espasa, R., Valero, M. A Simulation Study of Decoupled Vector Architectures. The Journal of Supercomputing 14, 124–152 (1999). https://doi.org/10.1023/A:1008158808410

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008158808410

Navigation