Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer

Schuette, Michael A.; Shen, John P.

doi:10.1007/BF01205186

Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer

Published: May 1993

Volume 7, pages 249–271, (1993)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Michael A. Schuette¹ &
John P. Shen²

42 Accesses
5 Citations
3 Altmetric
Explore all metrics

Abstract

Advances in compiler technology have recently led to the introduction of the architectural paradigm known as thevery long instruction word (VLIW) architecture. The Multiflow Trace series of processors is the first commercial line of processors with this architecture. This article presents experimental results concerning the performance and resource utilization of the TRACE 14/300 on a set of 11 common scientific programs written in both C and FORTRAN. Several characteristics of the application, architecture, implementation, and compiler that contribute to the observed results are identified. These characteristics include a conservative approach by the compiler in determining the existence of data dependence and disambiguating memory references, memory latency and resource dependences resulting from the TRACE 14/300 implementation, and actual data dependences that exist within the code. Alleviating the effects of the first three of these bottlenecks is found to improve the TRACE 14/300 performance by a factor of 1.55 on average. Performance of the TRACE 14/300 is also measured on several standard benchmarks, including the SPEC89 benchmark suite. Performance on the SPEC89 benchmarks is found to be comparable to the superscalar IBM RS/6000 when differences in implementation technology are considered. Concluding remarks concerning instruction-level parallel processing are also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Breternitz, M., Jr., and Shen, J.P. 1988. Organization of array data for concurrent memory access. InProc. 21st Internat. Symp. on Microarchitecture (Nov.).
Colwell, R.P., Nix, R.P., O'Donnell, J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. InProc., 2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct. 5–8), pp. 180–192.
Colwell, R.P., Hall, W.E., Joshi, C.S., Papworth, D.B., Rodman, P.K., and Tornes, J.E. 1990. Architecture and implementation of a VLIW supercomputer. InProc., Supercomputing '90 (Nov.), pp. 910–919.
Google Scholar
Ebcioglu, K. 1988. Some design ideas for a VLIW architecture for sequential-natured software. IBM research rept. (Apr.).
Ellis, J.R. 1985. Bulldog: A compiler for VLIW architectures. Ph.D. thesis, Yale Univ., New Haven, Conn.
Google Scholar
Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7 (July): 478–490.
Google Scholar
Fisher, J.A. 1990. Very long instruction word architectures and the ELI-512. InProc., 10th Internat. Symp. on Comp. Architecture, pp. 140–150.
Hart, J.F., Cheney, E.W., Lawson, C.L., Maehly, H.J., Mesztenyi, C.K., Rice, J.R., Thacher, H.G. Jr., and Witzgall, C. 1968.Computer Approximations. John Wiley, New York.
Google Scholar
Johnson, M. 1991.Superscalar Microprocessor Design. Prentice-Hall, Englewood Cliffs, N.J.
Google Scholar
Jouppi, N. 1989. The nonuniform distribution of instruction-level and machine parallelism and its effect on performance.IEEE Trans. Comps., C-38, 12 (Dec.): 1645–1658.
Google Scholar
Labrousse, J., and Slavenburg, G. 1990. A 50 MHz microprocessor with a VLIW architecture. InProc., Internat. Solid State Circuits Conf. (San Francisco), pp. 44–45.
Nicolau, A. 1985. Percolation scheduling: A parallel compilation technique. Tech. Rept. TR 85-678, Dept. of Comp. Sci., Cornell, Ithaca, N.Y.
Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1988.Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge, Mass.
Google Scholar
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1989.Numerical Recipes in FORTRAN: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge, Mass.
Google Scholar
Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs.IEEE Comp., 22, 1 (Jan.):12–34.
Google Scholar
SPEC. 1990.SPEC Benchmark Suite Release 1.0. Spring.
Stephens, C., Cogswell, B., Heinlein, J., Palmer, G., and Shen, J.P. 1991. Instruction level profiling and evaluation of the IBM RS/6000. InProc., 18th Annual Internat. Symp. on Comp. Architecture (Toronto, May 27–30), pp. 180–189.
Wall, D.W. 1991. Limits of instruction-level parallelism. InProc., 4th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.), pp. 176–188.
Wolfe, A., and Shen, J.P. 1991. A variable instruction stream extension to the VLIW architecture. InProc., 4th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.), pp. 2–14.

Download references

Author information

Authors and Affiliations

Software Systems Research Laboratory, Motorola, Inc., 3701 Algonquin Rd., Suite 600, 60008, Rolling Meadows, IL
Michael A. Schuette
Center for Dependable Systems, Electrical & Computer Engineering Dept., Carnegie Mellon University, 15213, Pittsburgh, PA
John P. Shen

Authors

Michael A. Schuette
View author publications
You can also search for this author in PubMed Google Scholar
John P. Shen
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schuette, M.A., Shen, J.P. Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer. J Supercomput 7, 249–271 (1993). https://doi.org/10.1007/BF01205186

Download citation

Received: 15 March 1992
Accepted: 15 October 1992
Issue Date: May 1993
DOI: https://doi.org/10.1007/BF01205186

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Performance improvement of the triangular matrix product in commodity clusters

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Performance improvement of the triangular matrix product in commodity clusters

Shared Memory Parallelism in Modern C++ and HPX

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation