Aggressive Dynamic Execution of Decoded Traces

Bishop, Benjamin; Kelliher, Thomas P.; Owens, Robert M.; Irwin, Mary Jane

doi:10.1023/A:1008125919892

Benjamin Bishop¹,
Thomas P. Kelliher²,
Robert M. Owens¹ &
…
Mary Jane Irwin¹

44 Accesses
Explore all metrics

Abstract

In this paper, we consider the increased performance that can be obtained by using, in concert, three previously proposed enhancements. These enhancements are aggressive dynamic (run time) instruction scheduling, the reuse of decoded instructions, and trace scheduling (both aggressive dynamic instruction scheduling and decoded instruction reuse have been used in commercial systems). We show that these three enhancements complement and support one another. Hence, while each of these enhancements has been shown to have merit in its own right, when used in concert, we claim the overall advantage is greater than that obtained by using any one singly. To support this claim, we present the results from running benchmarks representing several common multimedia kernels. Subsequent simulations show results of 7.3 instructions completed per cycle for the best-performing benchmark for a reasonably aggressive microarchitecture that combines trace scheduling of decoded instructions (i.e., decoded traces) with aggressive dynamic execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extracting Threaded Traces in Simulation Environments

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

Article 20 September 2017

The Return of Power Gating: Smart Leakage Energy Reductions in Modern Out-of-Order Processor Architectures

References

K. Gong and L. Rowe, “Parallel MPEG-1 video encoding,” 1994 Picture Coding Symposium, Sacramento, CA, Sept. 1994.
"A tour of the Pentium Pro processor microarchitecture,” http://www.intel.com/procs/ppro/info/p6white/index.htm.
M. Smotherman and M. Franklin, “Improving CISC instruction decoding performance using a fill unit,” Proc. 28th Ann. International Symposium on Microarchitecture, 1995.
S. Melvin, M. Shebanow, and Y. Patt, “Hardware support for large atomic units in dynamically scheduled machines,” Proc. 21st Ann. International Symposium on Microarchitecture, Dec. 1988.
L. Rowe, “BerkeleyMPEGtools,” ftp://mm-ftp.cs.berkeley.edu/ pub/multimedia/mpeg/bmt1r1.tar.gz.
E. Rotenberg, S. Bennett, and J.E. Smith, “Trace cache: A low latency approach to high bandwidth instruction fetching,” 29th Annual International Symposium on Microarchitecture, Dec. 1996.
W.M. Hwu and Y.N. Patt, “HPSm, a high performance restricted data flowarchitecture having minimal functionally,” Proc. ISCA, Tokyo, pp. 297–306, 1986.
Y.N. Patt, W.M. Hwu, and M.C. Shebanow, “HPS, a new microarchitecture: Rational and introduction,” 18th Annual International Symposium on Microarchitecture, Asilomar, pp. 103–108, Dec. 1985.
M. Hiraki, R. Bajua, H. Kojina, D. Gorny, K. Witta, A. Shridh, K. Sasaki, and K. Seki, “Stage-skip pipeline: A low power processor architecture using a decoded instruction buffer,” 1996 International Symposium on Low Power Electronics and Design, Aug. 1996.
S. Vajapeyam and T. Mitra, “Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences,” Proc. 24th Annual International Symposium on Computer Architecture, Denver, June 1997.
R. Nair and M. Hopkins, “Exploiting instruction level parallelism in processors by caching scheduled groups,” Proc. 24th Annual International Symposium on Computer Architecture, Denver, June 1997.
D.H. Friendly, S.J. Patel, and Y.N. Patt, “Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors,” 31st Annual International Symposium on Microarchitecture, Dallas, pp. 173–181, Nov. 1998.
J.L. Hennessy and D.A. Patterson, Computer Architecture a Quantitative Approach, Morgan Kaufmann Publishers, San Mateo, CA.
"The GCC compiler-Version 2.7.2,” http://ftp.cs.umn.edu/ pub/gnu/gcc-2.7.2.tar.gz.
G. Bergland, “A radix-eight fast fourier transform subroutine for real-valued series,” IEEE Transactions on Audio and Electroacoustics, Vol. AU-17, pp. 138–144, 1969.
Article Google Scholar
A. Peleg, S. Wilkie, and U. Weiser, “Intel MMX for multimedia PCs,” Communications of the ACM, Vol. 40, No. 1, pp. 25–38, Jan. 1997.
Article Google Scholar
M. Slater, “The land beyond benchmarks,” Comput. Commun. OEM, Mag. 4, 31, pp. 64–77, Sept. 1996.
MathSciNet Google Scholar
M.A. Jenkins and J.F. Traub, “Algorithm 419: Zeros of a complex polynomial,” Communications of the ACM, Vol. 15, No. 2, p. 97, Feb. 1972.
Article Google Scholar
B. Bishop, T.P. Kelliher, R.M. Owens, and M.J. Irwin, “Reevaluating MPEG motion compensation search criteria,” 1998 IEEEWorkshop on Signal Processing Systems, Cambridge, MA, pp. 123–131, Oct. 1998.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802
Benjamin Bishop, Robert M. Owens & Mary Jane Irwin
Department of Mathematics and Computer Science, Goucher College, Baltimore, MD, 21204
Thomas P. Kelliher

Authors

Benjamin Bishop
View author publications
You can also search for this author in PubMed Google Scholar
Thomas P. Kelliher
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Owens
View author publications
You can also search for this author in PubMed Google Scholar
Mary Jane Irwin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bishop, B., Kelliher, T.P., Owens, R.M. et al. Aggressive Dynamic Execution of Decoded Traces. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 22, 65–75 (1999). https://doi.org/10.1023/A:1008125919892

Download citation

Published: 01 August 1999
Issue Date: August 1999
DOI: https://doi.org/10.1023/A:1008125919892

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aggressive Dynamic Execution of Decoded Traces

Abstract

Access this article

Similar content being viewed by others

Extracting Threaded Traces in Simulation Environments

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

The Return of Power Gating: Smart Leakage Energy Reductions in Modern Out-of-Order Processor Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aggressive Dynamic Execution of Decoded Traces

Abstract

Access this article

Similar content being viewed by others

Extracting Threaded Traces in Simulation Environments

A performance study of the time-varying cache behavior: a study on APEX, Mantevo, NAS, and PARSEC

The Return of Power Gating: Smart Leakage Energy Reductions in Modern Out-of-Order Processor Architectures

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation