Abstract
Advances in compiler technology have recently led to the introduction of the architectural paradigm known as thevery long instruction word (VLIW) architecture. The Multiflow Trace series of processors is the first commercial line of processors with this architecture. This article presents experimental results concerning the performance and resource utilization of the TRACE 14/300 on a set of 11 common scientific programs written in both C and FORTRAN. Several characteristics of the application, architecture, implementation, and compiler that contribute to the observed results are identified. These characteristics include a conservative approach by the compiler in determining the existence of data dependence and disambiguating memory references, memory latency and resource dependences resulting from the TRACE 14/300 implementation, and actual data dependences that exist within the code. Alleviating the effects of the first three of these bottlenecks is found to improve the TRACE 14/300 performance by a factor of 1.55 on average. Performance of the TRACE 14/300 is also measured on several standard benchmarks, including the SPEC89 benchmark suite. Performance on the SPEC89 benchmarks is found to be comparable to the superscalar IBM RS/6000 when differences in implementation technology are considered. Concluding remarks concerning instruction-level parallel processing are also presented.
Similar content being viewed by others
References
Breternitz, M., Jr., and Shen, J.P. 1988. Organization of array data for concurrent memory access. InProc. 21st Internat. Symp. on Microarchitecture (Nov.).
Colwell, R.P., Nix, R.P., O'Donnell, J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. InProc., 2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct. 5–8), pp. 180–192.
Colwell, R.P., Hall, W.E., Joshi, C.S., Papworth, D.B., Rodman, P.K., and Tornes, J.E. 1990. Architecture and implementation of a VLIW supercomputer. InProc., Supercomputing '90 (Nov.), pp. 910–919.
Ebcioglu, K. 1988. Some design ideas for a VLIW architecture for sequential-natured software. IBM research rept. (Apr.).
Ellis, J.R. 1985. Bulldog: A compiler for VLIW architectures. Ph.D. thesis, Yale Univ., New Haven, Conn.
Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7 (July): 478–490.
Fisher, J.A. 1990. Very long instruction word architectures and the ELI-512. InProc., 10th Internat. Symp. on Comp. Architecture, pp. 140–150.
Hart, J.F., Cheney, E.W., Lawson, C.L., Maehly, H.J., Mesztenyi, C.K., Rice, J.R., Thacher, H.G. Jr., and Witzgall, C. 1968.Computer Approximations. John Wiley, New York.
Johnson, M. 1991.Superscalar Microprocessor Design. Prentice-Hall, Englewood Cliffs, N.J.
Jouppi, N. 1989. The nonuniform distribution of instruction-level and machine parallelism and its effect on performance.IEEE Trans. Comps., C-38, 12 (Dec.): 1645–1658.
Labrousse, J., and Slavenburg, G. 1990. A 50 MHz microprocessor with a VLIW architecture. InProc., Internat. Solid State Circuits Conf. (San Francisco), pp. 44–45.
Nicolau, A. 1985. Percolation scheduling: A parallel compilation technique. Tech. Rept. TR 85-678, Dept. of Comp. Sci., Cornell, Ithaca, N.Y.
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1988.Numerical Recipes in C: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge, Mass.
Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetterling, W.T. 1989.Numerical Recipes in FORTRAN: The Art of Scientific Computing. Cambridge Univ. Press, Cambridge, Mass.
Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs.IEEE Comp., 22, 1 (Jan.):12–34.
SPEC. 1990.SPEC Benchmark Suite Release 1.0. Spring.
Stephens, C., Cogswell, B., Heinlein, J., Palmer, G., and Shen, J.P. 1991. Instruction level profiling and evaluation of the IBM RS/6000. InProc., 18th Annual Internat. Symp. on Comp. Architecture (Toronto, May 27–30), pp. 180–189.
Wall, D.W. 1991. Limits of instruction-level parallelism. InProc., 4th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.), pp. 176–188.
Wolfe, A., and Shen, J.P. 1991. A variable instruction stream extension to the VLIW architecture. InProc., 4th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.), pp. 2–14.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schuette, M.A., Shen, J.P. Instruction-level experimental evaluation of the Multiflow TRACE 14/300 VLIW computer. J Supercomput 7, 249–271 (1993). https://doi.org/10.1007/BF01205186
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01205186