Abstract
Decoupled architectures are fine-grain processors that partition the memory access and execute functions in a computer program and exploit the parallelism between the two functions. Although some concepts from the traditional decoupled access execute paradigm made its way into commercial processors, they encountered resistance in general-purpose applications because these applications are not very structured and regular. However, multimedia applications have recently become dominant workload on desktops and workstations. Media applications are very structured and regular and lend themselves well to the decoupling concept. In this paper, we present an architecture that decouples the useful/true computations from the overhead/supporting instructions in media applications. The proposed scheme is incorporated into an out-of-order general-purpose processor enhanced with SIMD extensions. Explicit hardware support is provided to exploit instruction level parallelism in the overhead component. Performance evaluation shows that such hardware can significantly improve performance over conventional SIMD enhanced general-purpose processors. Results on nine multimedia benchmarks show that the proposed MediaBreeze architecture provides a 1.05x to 16.7x performance improvement over a 2-way out-of-order SIMD machine. On introducing slip-based data prefetching, a performance improvement up to 28x is observed.
- J. E. Smith, "Decoupled access/execute computer architectures." ACM Trans. on Computer Systems, vol. 2, no. 4, pp.289-308. Nov. 1984. Google ScholarDigital Library
- J. E. Smith, S. Weiss, and N. Y. Pang, "A simulation study of decoupled architecture computers," IEEE Trans. on Computers, vol. C-35, No. 8, pp. 692-701, Aug, 1986. Google ScholarDigital Library
- L. Kurian, "Issues in the design of a decoupled architecture for a RISC environment," Ph.D. thesis, The Pennsylvania State University, Aug. 1993. Google ScholarDigital Library
- H. G. Cragon, and W. J. Watson, "The TI advanced scientific computer." IEEE Computer Magazine, pp. 55-64, Jan. 1989. Google ScholarDigital Library
- A. R. Pleszkun and E. S. Davidson, "Structured memory access architecture," Proc. IEEE. Int. Conf. on Parallel Processing, pp. 461-471, 1983.Google Scholar
- R. R. Shively, "Architecture of a programmable digital signal processor," IEEE Trans. on Computers, vol. C-31, pp. 16-22, Jan. 1978.Google Scholar
- J. R. Goodman, T. J, Hsieh, K. Liou, A. R. Pleszkun, P. B. Schechter, and H. C. Young, "PIPE: A VLSI decoupled architecture," Proc. IEEE Sym. on Computer Architecture, pp. 20-27, Jun. 1985. Google ScholarDigital Library
- J. E. Thornton, "Parallel operation in the Control Data 6600," Fall Joint Computers Conference, vol. 26, pp. 33-40, 1961.Google Scholar
- Y. Zhang, and G. B. Adams, "Performance modeling and code partitioning for the DS architecture," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 293-304, Jun. 1998. Google ScholarDigital Library
- Wm. A. wolf, "Evaluation of the WM architecture," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 382-390, May 1992. Google ScholarDigital Library
- D. J. Kuck, and R. A. Stokes, "The Burroughs scientific processor (BSP)," IEEE Trans. on Computers, C-31 (5), pp. 363-376, 1982.Google ScholarDigital Library
- P. Ranganathan, S. Adve, and N. Jouppi, "Performance of image and video processing with general-purpose processors and media ISA extensions," Proc. IEEE/ACM Sym. on Computer Architecture, pp. 124-135, May 1999. Google ScholarDigital Library
- D. Burger, and T. M. Austin, "The SimpleScalar tool set," Version 2.0. Technical Report 1342, Univ. of Wisconsin-Madison, Comp. Sci. Dept, 1997.Google ScholarDigital Library
- D. Talla, "Architectural techniques to accelerate multimedia applications on general-purpose processors," Ph.D. Thesis, The University of Texas at Austin, Aug. 2001. Available: http://www.ece.utexas.edu/-deepu/phd_thesis.pdfGoogle Scholar
- P. Lapsley, J. Bier, A. Shoham, and E. A. Lee, DSP Processor Fundamentals: Architectures and Features, Chapter 8, IEEE Press series on Signal Processing, ISBN 0-7803-3405-1, 1997. Google ScholarDigital Library
- S. A. Mckee, "Maximizing memory bandwidth for streamed computations," Ph.D. Thesis, School of Engineering and Applied Science, University of Virginia, May 1995. Google ScholarDigital Library
- J. Corbal, R. Espasa, and M. Valero, "On the efficiency of reductions in micro-SIMD media extensions," Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, Sep. 2001. Google ScholarDigital Library
- R. B. Lee, "Multimedia extensions for general-purpose processors," Proc. IEEE Workshop on Signal Processing Systems, pp, 9-23, Nov. 1997.Google Scholar
- D. Talla and L. K. John, "Cost-effective hardware acceleration of multimedia applications," Proc. Int. Conf. on Computer Design, pp. 415-424, Sep. 2001. Google ScholarDigital Library
Index Terms
- MediaBreeze: a decoupled architecture for accelerating multimedia applications
Recommendations
Architecture optimization for multimedia application exploiting data and thread-level parallelism
The characteristics of multimedia applications when executed oil general-purpose processors are not well understood. Such knowledge is extremely important in guiding the development of multimedia applications and the design of future processors.In this ...
ALP: Efficient support for all levels of parallelism for complex media applications
The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for ...
MMX Technology Extension to the Intel Architecture
The MMX TM Technology extension to the Intel Architecture is designed to accelerate multimedia and communications software running on Intel Architecture processors. The technology introduces new data types and instructions that implement a SIMD ...
Comments