Skip to main content
Log in

Abstract

In this paper, we consider the increased performance that can be obtained by using, in concert, three previously proposed enhancements. These enhancements are aggressive dynamic (run time) instruction scheduling, the reuse of decoded instructions, and trace scheduling (both aggressive dynamic instruction scheduling and decoded instruction reuse have been used in commercial systems). We show that these three enhancements complement and support one another. Hence, while each of these enhancements has been shown to have merit in its own right, when used in concert, we claim the overall advantage is greater than that obtained by using any one singly. To support this claim, we present the results from running benchmarks representing several common multimedia kernels. Subsequent simulations show results of 7.3 instructions completed per cycle for the best-performing benchmark for a reasonably aggressive microarchitecture that combines trace scheduling of decoded instructions (i.e., decoded traces) with aggressive dynamic execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. K. Gong and L. Rowe, “Parallel MPEG-1 video encoding,” 1994 Picture Coding Symposium, Sacramento, CA, Sept. 1994.

  2. "A tour of the Pentium Pro processor microarchitecture,” http://www.intel.com/procs/ppro/info/p6white/index.htm.

  3. M. Smotherman and M. Franklin, “Improving CISC instruction decoding performance using a fill unit,” Proc. 28th Ann. International Symposium on Microarchitecture, 1995.

  4. S. Melvin, M. Shebanow, and Y. Patt, “Hardware support for large atomic units in dynamically scheduled machines,” Proc. 21st Ann. International Symposium on Microarchitecture, Dec. 1988.

  5. L. Rowe, “BerkeleyMPEGtools,” ftp://mm-ftp.cs.berkeley.edu/ pub/multimedia/mpeg/bmt1r1.tar.gz.

  6. E. Rotenberg, S. Bennett, and J.E. Smith, “Trace cache: A low latency approach to high bandwidth instruction fetching,” 29th Annual International Symposium on Microarchitecture, Dec. 1996.

  7. W.M. Hwu and Y.N. Patt, “HPSm, a high performance restricted data flowarchitecture having minimal functionally,” Proc. ISCA, Tokyo, pp. 297–306, 1986.

  8. Y.N. Patt, W.M. Hwu, and M.C. Shebanow, “HPS, a new microarchitecture: Rational and introduction,” 18th Annual International Symposium on Microarchitecture, Asilomar, pp. 103–108, Dec. 1985.

  9. M. Hiraki, R. Bajua, H. Kojina, D. Gorny, K. Witta, A. Shridh, K. Sasaki, and K. Seki, “Stage-skip pipeline: A low power processor architecture using a decoded instruction buffer,” 1996 International Symposium on Low Power Electronics and Design, Aug. 1996.

  10. S. Vajapeyam and T. Mitra, “Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences,” Proc. 24th Annual International Symposium on Computer Architecture, Denver, June 1997.

  11. R. Nair and M. Hopkins, “Exploiting instruction level parallelism in processors by caching scheduled groups,” Proc. 24th Annual International Symposium on Computer Architecture, Denver, June 1997.

  12. D.H. Friendly, S.J. Patel, and Y.N. Patt, “Putting the fill unit to work: Dynamic optimizations for trace cache microprocessors,” 31st Annual International Symposium on Microarchitecture, Dallas, pp. 173–181, Nov. 1998.

  13. J.L. Hennessy and D.A. Patterson, Computer Architecture a Quantitative Approach, Morgan Kaufmann Publishers, San Mateo, CA.

  14. "The GCC compiler-Version 2.7.2,” http://ftp.cs.umn.edu/ pub/gnu/gcc-2.7.2.tar.gz.

  15. G. Bergland, “A radix-eight fast fourier transform subroutine for real-valued series,” IEEE Transactions on Audio and Electroacoustics, Vol. AU-17, pp. 138–144, 1969.

    Article  Google Scholar 

  16. A. Peleg, S. Wilkie, and U. Weiser, “Intel MMX for multimedia PCs,” Communications of the ACM, Vol. 40, No. 1, pp. 25–38, Jan. 1997.

    Article  Google Scholar 

  17. M. Slater, “The land beyond benchmarks,” Comput. Commun. OEM, Mag. 4, 31, pp. 64–77, Sept. 1996.

    MathSciNet  Google Scholar 

  18. M.A. Jenkins and J.F. Traub, “Algorithm 419: Zeros of a complex polynomial,” Communications of the ACM, Vol. 15, No. 2, p. 97, Feb. 1972.

    Article  Google Scholar 

  19. B. Bishop, T.P. Kelliher, R.M. Owens, and M.J. Irwin, “Reevaluating MPEG motion compensation search criteria,” 1998 IEEEWorkshop on Signal Processing Systems, Cambridge, MA, pp. 123–131, Oct. 1998.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bishop, B., Kelliher, T.P., Owens, R.M. et al. Aggressive Dynamic Execution of Decoded Traces. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 22, 65–75 (1999). https://doi.org/10.1023/A:1008125919892

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008125919892

Keywords

Navigation