skip to main content
article

MediaBreeze: a decoupled architecture for accelerating multimedia applications

Published:01 December 2001Publication History
Skip Abstract Section

Abstract

Decoupled architectures are fine-grain processors that partition the memory access and execute functions in a computer program and exploit the parallelism between the two functions. Although some concepts from the traditional decoupled access execute paradigm made its way into commercial processors, they encountered resistance in general-purpose applications because these applications are not very structured and regular. However, multimedia applications have recently become dominant workload on desktops and workstations. Media applications are very structured and regular and lend themselves well to the decoupling concept. In this paper, we present an architecture that decouples the useful/true computations from the overhead/supporting instructions in media applications. The proposed scheme is incorporated into an out-of-order general-purpose processor enhanced with SIMD extensions. Explicit hardware support is provided to exploit instruction level parallelism in the overhead component. Performance evaluation shows that such hardware can significantly improve performance over conventional SIMD enhanced general-purpose processors. Results on nine multimedia benchmarks show that the proposed MediaBreeze architecture provides a 1.05x to 16.7x performance improvement over a 2-way out-of-order SIMD machine. On introducing slip-based data prefetching, a performance improvement up to 28x is observed.

References

  1. J. E. Smith, "Decoupled access/execute computer architectures." ACM Trans. on Computer Systems, vol. 2, no. 4, pp.289-308. Nov. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. E. Smith, S. Weiss, and N. Y. Pang, "A simulation study of decoupled architecture computers," IEEE Trans. on Computers, vol. C-35, No. 8, pp. 692-701, Aug, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Kurian, "Issues in the design of a decoupled architecture for a RISC environment," Ph.D. thesis, The Pennsylvania State University, Aug. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. G. Cragon, and W. J. Watson, "The TI advanced scientific computer." IEEE Computer Magazine, pp. 55-64, Jan. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. R. Pleszkun and E. S. Davidson, "Structured memory access architecture," Proc. IEEE. Int. Conf. on Parallel Processing, pp. 461-471, 1983.Google ScholarGoogle Scholar
  6. R. R. Shively, "Architecture of a programmable digital signal processor," IEEE Trans. on Computers, vol. C-31, pp. 16-22, Jan. 1978.Google ScholarGoogle Scholar
  7. J. R. Goodman, T. J, Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, and H. C. Young, "PIPE: A VLSI decoupled architecture," Proc. IEEE Sym. on Computer Architecture, pp. 20-27, Jun. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. E. Thornton, "Parallel operation in the Control Data 6600," Fall Joint Computers Conference, vol. 26, pp. 33-40, 1961.Google ScholarGoogle Scholar
  9. Y. Zhang, and G. B. Adams, "Performance modeling and code partitioning for the DS architecture," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 293-304, Jun. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wm. A. wolf, "Evaluation of the WM architecture," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 382-390, May 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. J. Kuck, and R. A. Stokes, "The Burroughs scientific processor (BSP)," IEEE Trans. on Computers, C-31 (5), pp. 363-376, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Ranganathan, S. Adve, and N. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 124-135, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Burger, and T. M. Austin, "The SimpleScalar tool set," Version 2.0. Technical Report 1342, Univ. of Wisconsin-Madison, Comp. Sci. Dept, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Talla, "Architectural techniques to accelerate multimedia applications on general-purpose processors," Ph.D. Thesis, The University of Texas at Austin, Aug. 2001. Available: http://www.ece.utexas.edu/-deepu/phd_thesis.pdfGoogle ScholarGoogle Scholar
  15. P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals: Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-7803-3405-1, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. A. Mckee, "Maximizing memory bandwidth for streamed computations," Ph.D. Thesis, School of Engineering and Applied Science, University of Virginia, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Corbal, R. Espasa, and M. Valero, "On the efficiency of reductions in micro-SIMD media extensions," Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, Sep. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. B. Lee, "Multimedia extensions for general-purpose processors," Proc. IEEE Workshop on Signal Processing Systems, pp, 9-23, Nov. 1997.Google ScholarGoogle Scholar
  19. D. Talla and L. K. John, "Cost-effective hardware acceleration of multimedia applications," Proc. Int. Conf. on Computer Design, pp. 415-424, Sep. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. MediaBreeze: a decoupled architecture for accelerating multimedia applications

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader